Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Migrate Trainer Configuration from NeMo 1.0 to NeMo 2.0#
In NeMo 2.0, the trainer configuration has been updated to use the nemo.lightning.Trainer
class. This guide will help you migrate your trainer setup.
NeMo 1.0 (Previous Release)#
In NeMo 1.0, the trainer was configured in the YAML configuration file.
trainer:
num_nodes: 16
devices: 8
accelerator: gpu
precision: bf16
logger: False # logger provided by exp_manager
max_epochs: null
max_steps: 75000 # consumed_samples = global_step * global_batch_size
max_time: "05:23:30:00"
log_every_n_steps: 10
val_check_interval: 2000
limit_val_batches: 50
limit_test_batches: 50
accumulate_grad_batches: 1
gradient_clip_val: 1.0
NeMo 2.0 (New Release)#
In NeMo 2.0, the trainer is configured using the nemo.lightning.Trainer
class.
from nemo import lightning as nl
trainer = nl.Trainer(
num_nodes=16,
devices=8,
accelerator="gpu",
strategy=strategy,
plugins=nl.MegatronMixedPrecision(precision="bf16-mixed"),
max_epochs=None,
max_steps=75000,
max_time="05:23:30:00",
log_every_n_steps=10,
val_check_interval=2000,
limit_val_batches=50,
limit_test_batches=50,
accumulate_grad_batches=1,
gradient_clip_val=1.0,
)
Migration Steps#
Remove the
trainer
section from your YAML config file.Add the following import to your Python script:
from nemo import lightning as nl
Create a
Trainer
object with the appropriate parameters:trainer = nl.Trainer( num_nodes=16, devices=8, accelerator="gpu", strategy=strategy, plugins=nl.MegatronMixedPrecision(precision="bf16-mixed"), max_epochs=None, max_steps=75000, max_time="05:23:30:00", log_every_n_steps=10, val_check_interval=2000, limit_val_batches=50, limit_test_batches=50, accumulate_grad_batches=1, gradient_clip_val=1.0, )
Adjust the parameters in the
Trainer
to match your previous YAML configuration.Use the
trainer
object in your training script as needed.
Note
The
nemo.lightning.Trainer
class is identical to PyTorch Lightning’s Trainer for most purposes.NeMo adds integration with its serialization system, allowing for exact recreation of the trainer used in a particular training run.
The
precision
parameter is now set using theMegatronMixedPrecision
plugin. Use"bf16-mixed"
for BF16 precision.The
logger
parameter is no longer needed in the trainer configuration, as it’s handled separately by theNeMoLogger
(see the exp-manager migration guide).