Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Migrate Optimizer Configuration from NeMo 1.0 to NeMo 2.0#

In NeMo 2.0, the optimizer configuration has changed from a YAML-based approach to using the OptimizerConfig class from Megatron-Core. This guide will help you migrate your optimizer setup.

NeMo 1.0 (Previous Release)#

In NeMo 1.0, the optimizer was configured in the YAML configuration file.

model:
    optim:
        name: fused_adam
        lr: 2e-4
        weight_decay: 0.01
        betas:
        - 0.9
        - 0.98
        sched:
            name: CosineAnnealing
            warmup_steps: 500
            constant_steps: 0
            min_lr: 2e-5

NeMo 2.0 (New Release)#

In NeMo 2.0, we use the OptimizerConfig class from Megatron Core, which is wrapped by NeMo’s MegatronOptimizerModule. Here’s how to set it up:

from nemo.collections import llm
from nemo import lightning as nl
from megatron.core.optimizer import OptimizerConfig

optim = nl.MegatronOptimizerModule(
    config=OptimizerConfig(
        optimizer="adam",
        lr=0.001,
        use_distributed_optimizer=True
    ),
    lr_scheduler=nl.lr_scheduler.CosineAnnealingScheduler(),
)

llm.train(..., optim=optim)

Migration Steps#

  1. Remove the optim section from your YAML configuration file.

  2. Import the necessary modules in your Python script:

    from nemo.collections import llm
    from nemo import lightning as nl
    from megatron.core.optimizer import OptimizerConfig
    
  3. Create an instance of MegatronOptimizerModule with the appropriate OptimizerConfig.

  4. Configure the OptimizerConfig with parameters similar to your previous YAML configuration:

    1. optimizer: String name of the optimizer (e.g., “adam” instead of “fused_adam”)

    2. lr: Learning rate

    3. use_distributed_optimizer: Set to True to use the distributed optimizer

  5. Set up the learning rate scheduler separately using NeMo’s scheduler classes.

  6. Pass the optim object to the llm.train() function.

By following these steps, you’ll successfully migrate your optimizer configuration from NeMo 1.0 to NeMo 2.0. Be aware that the exact parameter names and available options may differ, so consult the OptimizerConfig documentation for a complete list of supported parameters.