Migrate Trainer Configuration from NeMo 1.0 to NeMo 2.0#

In NeMo 2.0, the trainer configuration has been updated to use the nemo.lightning.Trainer class. This guide will help you migrate your trainer setup.

NeMo 1.0 (Previous Release)#

In NeMo 1.0, the trainer was configured in the YAML configuration file.

trainer:
  num_nodes: 16
  devices: 8
  accelerator: gpu
  precision: bf16
  logger: False # logger provided by exp_manager
  max_epochs: null
  max_steps: 75000 # consumed_samples = global_step * global_batch_size
  max_time: "05:23:30:00"
  log_every_n_steps: 10
  val_check_interval: 2000
  limit_val_batches: 50
  limit_test_batches: 50
  accumulate_grad_batches: 1
  gradient_clip_val: 1.0

NeMo 2.0 (New Release)#

In NeMo 2.0, the trainer is configured using the nemo.lightning.Trainer class.

from nemo import lightning as nl

trainer = nl.Trainer(
    num_nodes=16,
    devices=8,
    accelerator="gpu",
    strategy=strategy,
    plugins=nl.MegatronMixedPrecision(precision="bf16-mixed"),
    max_epochs=None,
    max_steps=75000,
    max_time="05:23:30:00",
    log_every_n_steps=10,
    val_check_interval=2000,
    limit_val_batches=50,
    limit_test_batches=50,
    accumulate_grad_batches=1,
    gradient_clip_val=1.0,
)

Migration Steps#

  1. Remove the trainer section from your YAML config file.

  2. Add the following import to your Python script:

    from nemo import lightning as nl
    
  3. Create a Trainer object with the appropriate parameters:

    trainer = nl.Trainer(
        num_nodes=16,
        devices=8,
        accelerator="gpu",
        strategy=strategy,
        plugins=nl.MegatronMixedPrecision(precision="bf16-mixed"),
        max_epochs=None,
        max_steps=75000,
        max_time="05:23:30:00",
        log_every_n_steps=10,
        val_check_interval=2000,
        limit_val_batches=50,
        limit_test_batches=50,
        accumulate_grad_batches=1,
        gradient_clip_val=1.0,
    )
    
  4. Adjust the parameters in the Trainer to match your previous YAML configuration.

  5. Use the trainer object in your training script as needed.

Note

  • The nemo.lightning.Trainer class is identical to PyTorch Lightning’s Trainer for most purposes.

  • NeMo adds integration with its serialization system, allowing for exact recreation of the trainer used in a particular training run.

  • The precision parameter is now set using the MegatronMixedPrecision plugin. Use "bf16-mixed" for BF16 precision.

  • The logger parameter is no longer needed in the trainer configuration, as it’s handled separately by the NeMoLogger (see the exp-manager migration guide).