Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.

Migrate Pre-Training from NeMo 1.0 to NeMo 2.0

NeMo 1.0 (Previous Release)

In NeMo 1.0, pre-training is configured using megatron_gpt_config.yaml and launched with megatron_gpt_pretaining.py.

NeMo 2.0 (New Release)

NeMo 2.0 introduces a Pythonic and modular approach to configuring experiments. For detailed instructions on migrating your NeMo 1.0 YAML configurations to NeMo 2.0, refer to the additional documents in this migration guide:

In addition, NeMo 2.0 is compatible with NeMo-Run, which streamlines the configuration and execution of NeMo experiments. Refer to the NeMo-Run documentation for more.

The llm.train API can be used to run pre-training in NeMo 2.0, as follows:

import torch
from nemo import lightning as nl
from nemo.collections import llm
from megatron.core.optimizer import OptimizerConfig

### set up your GPT model config
gpt_config = llm.GPTConfig(
   num_layers=12,
   hidden_size=384,
   ffn_hidden_size=1536,
   num_attention_heads=6,
   seq_length=seq_length,
   init_method_std=0.023,
   hidden_dropout=0.1,
   attention_dropout=0.1,
   layernorm_epsilon=1e-5,
   make_vocab_size_divisible_by=128,

)

### other docs in this section explain how to configure these
data = llm.PreTrainingDataModule(...)
model = llm.GPTModel(gpt_config, tokenizer=data.tokenizer)
strategy = nl.MegatronStrategy(...)
opt_config = OptimizerConfig(...)
opt = nl.MegatronOptimizerModule(config=opt_config)
trainer = nl.Trainer(...)
nemo_logger = nl.NeMoLogger(dir="test_logdir")

llm.train(
   model=model,
   data=data,
   trainer=trainer,
   log=nemo_logger,
   tokenizer='data',
   optim=opt,
)

In addition to the generic GPTModel used in the example above, we also support Gemma, Llama, Mistral, and Mixtral models.

Migration Steps

  1. Migrate your NeMo 1.0 YAML to NeMo 2.0 using the other documents in the migration guide.

  2. Run pre-training using the llm.train API.