Migrate Trainer Configuration from NeMo 1.0 to NeMo 2.0#
In NeMo 2.0, the trainer configuration has been updated to use the nemo.lightning.Trainer class. This guide will help you migrate your trainer setup.
NeMo 1.0 (Previous Release)#
In NeMo 1.0, the trainer was configured in the YAML configuration file.
trainer:
num_nodes: 16
devices: 8
accelerator: gpu
precision: bf16
logger: False # logger provided by exp_manager
max_epochs: null
max_steps: 75000 # consumed_samples = global_step * global_batch_size
max_time: "05:23:30:00"
log_every_n_steps: 10
val_check_interval: 2000
limit_val_batches: 50
limit_test_batches: 50
accumulate_grad_batches: 1
gradient_clip_val: 1.0
NeMo 2.0 (New Release)#
In NeMo 2.0, the trainer is configured using the nemo.lightning.Trainer class.
from nemo import lightning as nl
trainer = nl.Trainer(
num_nodes=16,
devices=8,
accelerator="gpu",
strategy=strategy,
plugins=nl.MegatronMixedPrecision(precision="bf16-mixed"),
max_epochs=None,
max_steps=75000,
max_time="05:23:30:00",
log_every_n_steps=10,
val_check_interval=2000,
limit_val_batches=50,
limit_test_batches=50,
accumulate_grad_batches=1,
gradient_clip_val=1.0,
)
Migration Steps#
Remove the
trainersection from your YAML config file.Add the following import to your Python script:
from nemo import lightning as nl
Create a
Trainerobject with the appropriate parameters:trainer = nl.Trainer( num_nodes=16, devices=8, accelerator="gpu", strategy=strategy, plugins=nl.MegatronMixedPrecision(precision="bf16-mixed"), max_epochs=None, max_steps=75000, max_time="05:23:30:00", log_every_n_steps=10, val_check_interval=2000, limit_val_batches=50, limit_test_batches=50, accumulate_grad_batches=1, gradient_clip_val=1.0, )
Adjust the parameters in the
Trainerto match your previous YAML configuration.Use the
trainerobject in your training script as needed.
Note
The
nemo.lightning.Trainerclass is identical to PyTorch Lightning’s Trainer for most purposes.NeMo adds integration with its serialization system, allowing for exact recreation of the trainer used in a particular training run.
The
precisionparameter is now set using theMegatronMixedPrecisionplugin. Use"bf16-mixed"for BF16 precision.The
loggerparameter is no longer needed in the trainer configuration, as it’s handled separately by theNeMoLogger(see the exp-manager migration guide).