Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Migrate Pre-Training from NeMo 1.0 to NeMo 2.0#
NeMo 1.0 (Previous Release)#
In NeMo 1.0, pre-training is configured using megatron_gpt_config.yaml and launched with megatron_gpt_pretaining.py.
NeMo 2.0 (New Release)#
NeMo 2.0 introduces a Pythonic and modular approach to configuring experiments. For detailed instructions on migrating your NeMo 1.0 YAML configurations to NeMo 2.0, refer to the additional documents in this migration guide:
In addition, NeMo 2.0 is compatible with NeMo-Run, which streamlines the configuration and execution of NeMo experiments. Refer to the NeMo-Run documentation for more.
The llm.train API can be used to run pre-training in NeMo 2.0, as follows:
import torch
from nemo import lightning as nl
from nemo.collections import llm
from megatron.core.optimizer import OptimizerConfig
### set up your GPT model config
gpt_config = llm.GPTConfig(
num_layers=12,
hidden_size=384,
ffn_hidden_size=1536,
num_attention_heads=6,
seq_length=seq_length,
init_method_std=0.023,
hidden_dropout=0.1,
attention_dropout=0.1,
layernorm_epsilon=1e-5,
make_vocab_size_divisible_by=128,
)
### other docs in this section explain how to configure these
data = llm.PreTrainingDataModule(...)
model = llm.GPTModel(gpt_config, tokenizer=data.tokenizer)
strategy = nl.MegatronStrategy(...)
opt_config = OptimizerConfig(...)
opt = nl.MegatronOptimizerModule(config=opt_config)
trainer = nl.Trainer(...)
nemo_logger = nl.NeMoLogger(dir="test_logdir")
llm.train(
model=model,
data=data,
trainer=trainer,
log=nemo_logger,
tokenizer='data',
optim=opt,
)
In addition to the generic GPTModel used in the example above, we also support Gemma, Llama, Mistral, and Mixtral models.
Migration Steps#
Migrate your NeMo 1.0 YAML to NeMo 2.0 using the other documents in the migration guide.
Run pre-training using the
llm.trainAPI.