Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.

Migrate Optimizer Configuration from NeMo 1.0 to NeMo 2.0

In NeMo 2.0, the optimizer configuration has changed from a YAML-based approach to using the OptimizerConfig class from Megatron-Core. This guide will help you migrate your optimizer setup.

NeMo 1.0 (Previous Release)

In NeMo 1.0, the optimizer was configured in the YAML configuration file.

model:
    optim:
        name: fused_adam
        lr: 2e-4
        weight_decay: 0.01
        betas:
        - 0.9
        - 0.98
        sched:
            name: CosineAnnealing
            warmup_steps: 500
            constant_steps: 0
            min_lr: 2e-5

NeMo 2.0 (New Release)

In NeMo 2.0, we use the OptimizerConfig class from Megatron Core, which is wrapped by NeMo’s MegatronOptimizerModule. Here’s how to set it up:

from nemo.collections import llm
from nemo import lightning as nl
from megatron.core.optimizer import OptimizerConfig

optim = nl.MegatronOptimizerModule(
    config=OptimizerConfig(
        optimizer="adam",
        lr=0.001,
        use_distributed_optimizer=True
    ),
    lr_scheduler=nl.lr_scheduler.CosineAnnealingScheduler(),
)

llm.train(..., optim=optim)

Migration Steps

  1. Remove the optim section from your YAML configuration file.

  2. Import the necessary modules in your Python script:

    from nemo.collections import llm
    from nemo import lightning as nl
    from megatron.core.optimizer import OptimizerConfig
    
  3. Create an instance of MegatronOptimizerModule with the appropriate OptimizerConfig.

  4. Configure the OptimizerConfig with parameters similar to your previous YAML configuration:

    1. optimizer: String name of the optimizer (e.g., “adam” instead of “fused_adam”)

    2. lr: Learning rate

    3. use_distributed_optimizer: Set to True to use the distributed optimizer

  5. Set up the learning rate scheduler separately using NeMo’s scheduler classes.

  6. Pass the optim object to the llm.train() function.

By following these steps, you’ll successfully migrate your optimizer configuration from NeMo 1.0 to NeMo 2.0. Be aware that the exact parameter names and available options may differ, so consult the OptimizerConfig documentation for a complete list of supported parameters.