Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Migrate Pre-Training from NeMo 1.0 to NeMo 2.0#

NeMo 1.0 (Previous Release)#

In NeMo 1.0, pre-training is configured using megatron_gpt_config.yaml and launched with megatron_gpt_pretaining.py.

NeMo 2.0 (New Release)#

NeMo 2.0 introduces a Pythonic and modular approach to configuring experiments. For detailed instructions on migrating your NeMo 1.0 YAML configurations to NeMo 2.0, refer to the additional documents in this migration guide:

In addition, NeMo 2.0 is compatible with NeMo-Run, which streamlines the configuration and execution of NeMo experiments. Refer to the NeMo-Run documentation for more.

The llm.train API can be used to run pre-training in NeMo 2.0, as follows:

import torch
from nemo import lightning as nl
from nemo.collections import llm
from megatron.core.optimizer import OptimizerConfig

### set up your GPT model config
gpt_config = llm.GPTConfig(
   num_layers=12,
   hidden_size=384,
   ffn_hidden_size=1536,
   num_attention_heads=6,
   seq_length=seq_length,
   init_method_std=0.023,
   hidden_dropout=0.1,
   attention_dropout=0.1,
   layernorm_epsilon=1e-5,
   make_vocab_size_divisible_by=128,

)

### other docs in this section explain how to configure these
data = llm.PreTrainingDataModule(...)
model = llm.GPTModel(gpt_config, tokenizer=data.tokenizer)
strategy = nl.MegatronStrategy(...)
opt_config = OptimizerConfig(...)
opt = nl.MegatronOptimizerModule(config=opt_config)
trainer = nl.Trainer(...)
nemo_logger = nl.NeMoLogger(dir="test_logdir")

llm.train(
   model=model,
   data=data,
   trainer=trainer,
   log=nemo_logger,
   tokenizer='data',
   optim=opt,
)

In addition to the generic GPTModel used in the example above, we also support Gemma, Llama, Mistral, and Mixtral models.

Migration Steps#

  1. Migrate your NeMo 1.0 YAML to NeMo 2.0 using the other documents in the migration guide.

  2. Run pre-training using the llm.train API.