Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.

Migration Guide

In NeMo 1.0, the main interface for configuring experiments was through YAML files. This approach allowed for a declarative way to set up experiments, but it had limitations in terms of flexibility and programmatic control.

NeMo 2.0 shifts to a Python-based configuration, which offers several advantages:

  1. More flexibility and control over the configuration.

  2. Better integration with IDEs for code completion and type checking.

  3. Easier to extend and customize configurations programmatically.

Let’s go through the main sections of the YAML config (like megatron_gpt_config.yaml) and how they map to Python code in NeMo 2.0:

  1. Trainer Configuration (trainer): The trainer section in YAML is replaced by the nemo.lightning.Trainer class in Python. This allows for more direct integration with PyTorch Lightning’s Trainer class while adding NeMo-specific functionality. A more detailed migration guide can be found here.

  2. Experiment Manager (exp_manager): The exp_manager section in YAML is replaced by NeMoLogger and AutoResume objects in Python. This allows for more granular control over logging and resuming experiments. A more detailed migration guide can be found here.

  3. Data Configuration (model.data): Data configuration in NeMo 2.0 is handled by pre-training and fine-tuning DataModule classes. A more detailed migration guide can be found here.

  4. Nsys Profiling (model.nsys_profile): The nsys_profile section in YAML is replaced by the NsysCallback class, which can be added to the Trainer’s callbacks list. A more detailed migration guide can be found here.

  5. Optimizer Configuration (model.optim): The optim section in YAML is replaced by the MegatronOptimizerModule class, which wraps Megatron Core’s OptimizerConfig. This provides a more flexible way to configure optimizers and learning rate schedulers. A more detailed migration guide can be found here.

Here’s a high-level example of how the configuration might look in NeMo 2.0:

import nemo_sdk as sdk
from nemo import lightning as nl
from nemo.collections import llm
from megatron.core.optimizer import OptimizerConfig


@sdk.factory
def trainer(devices=2) -> nl.Trainer:
    strategy = nl.MegatronStrategy(tensor_model_parallel_size=devices)

    return nl.Trainer(
        devices=devices,
        max_steps=100,
        accelerator="gpu",
        strategy=strategy,
        plugins=nl.MegatronMixedPrecision(precision="bf16-mixed"),
    )


@sdk.factory
def logger() -> nl.NeMoLogger:
    ckpt = nl.ModelCheckpoint(
        save_best_model=True,
        save_last=True,
        monitor="reduced_train_loss",
        save_top_k=2,
        save_on_train_epoch_end=True,
    )

    return nl.NeMoLogger(ckpt=ckpt)


@sdk.factory
def adam_with_cosine_annealing() -> nl.OptimizerModule:
    return nl.MegatronOptimizerModule(
        config=OptimizerConfig(
            optimizer="adam",
            lr=0.001,
            use_distributed_optimizer=True
        ),
        lr_scheduler=nl.lr_scheduler.CosineAnnealingScheduler(),
    )


pretrain = sdk.Partial(
    llm.pretrain,
    model=llm.mistral,
    data=llm.squad,
    trainer=trainer,
    log=logger,
    optim=adam_with_cosine_annealing,
)
pretrain.optim.config.lr = 0.001
pretrain.optim.lr_scheduler.max_steps = 100


if __name__ == "__main__":
    sdk.run(pretrain, name="mistral-sft", direct=True)

This Python-based configuration allows for more programmatic control and easier integration with the rest of your codebase. It also enables better type checking and code completion in modern IDEs, making it easier to work with complex configurations.

Remember that this is a high-level example, and the exact implementation details may vary depending on the specific NeMo 2.0 API. You’ll need to refer to the NeMo 2.0 documentation for the most up-to-date and accurate way to configure your experiments.