Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
BERT#
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained deep learning model designed for natural language processing tasks, developed by Google in 2018. Unlike traditional models that read text sequentially (left-to-right or right-to-left), BERT reads text in both directions simultaneously, capturing context from both sides of a word. This bidirectional approach allows BERT to better understand nuances and meanings in language, improving performance on tasks like question answering, sentiment analysis, and language inference.
NeMo 2.0 Pretraining Recipes#
We provide recipes for pretraining BERT models for the following sizes: base(110m) and large(330m) using NeMo 2.0 and NeMo-Run.
These recipes configure a run.Partial
for one of the nemo.collections.llm api functions introduced in NeMo 2.0.
The recipes are hosted in bert_110m,
Note
The pretraining recipes use the MockDataModule
for the data
argument.
You are expected to replace the MockDataModule
with your own custom dataset.
We provide an example below on how to invoke the default recipe and override the data argument:
from nemo.collections import llm
pretrain = llm.bert_110m.pretrain_recipe(
name="bert_base_pretrain",
dir=f"/path/to/checkpoints",
num_nodes=2,
num_gpus_per_node=8,
bert_type="megatron",
)
# # To override the data argument
# dataloader = a_function_that_configures_your_custom_dataset(
# global_batch_size=global_batch_size,
# micro_batch_size=micro_batch_size,
# seq_length=pretrain.model.config.seq_length,
# )
# pretrain.data = dataloader
Note
The configuration in the recipes is done using the NeMo-Run
run.Config
andrun.Partial
configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.bert_type
can be eitherhuggingface
ormegatron
.huggingface
refer to the model architecture on huggingface/google-bert , whilemegatron
refer to the model architecture strictly follow Megatron-LM. The major differences between the two is Megatron uses Pre-LayerNorm while HuggingFace uses Post-LayerNorm after MLP and attention modules.
Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows:
import nemo_run as run
run.run(pretrain, executor=run.LocalExecutor())
Additionally, you can also run it directly in the same Python process as follows:
run.run(pretrain, direct=True)
A comprehensive list of pretraining recipes that we currently support or plan to support soon is provided below for reference:
Recipe |
Status |
---|---|
Hugging Face BERT-Base (110M) |
Yes |
Hugging Face BERT-large (340M) |
Yes |
Megatron BERT-Base (110M) |
Yes |
Megatron BERT-large (340M) |
Yes |