Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.
Callbacks
Exponential Moving Average (EMA)
During training, EMA maintains a moving average of the trained parameters. EMA parameters can produce significantly better results and faster convergence for a variety of different domains and models.
EMA is a simple calculation. EMA Weights are pre-initialized with the model weights at the start of training.
Every training update, the EMA weights are updated based on the new model weights.
Enabling EMA is straightforward. We can pass the additional argument to the experiment manager at runtime.
python examples/asr/asr_ctc/speech_to_text_ctc.py \
model.train_ds.manifest_filepath=/path/to/my/train/manifest.json \
model.validation_ds.manifest_filepath=/path/to/my/validation/manifest.json \
trainer.devices=2 \
trainer.accelerator='gpu' \
trainer.max_epochs=50 \
exp_manager.ema.enable=True # pass this additional argument to enable EMA
To change the decay rate, pass the additional argument.
python examples/asr/asr_ctc/speech_to_text_ctc.py \
...
exp_manager.ema.enable=True \
exp_manager.ema.decay=0.999
We also offer other helpful arguments.
Argument |
Description |
---|---|
exp_manager.ema.validate_original_weights=True |
Validate the original weights instead of EMA weights. |
exp_manager.ema.every_n_steps=2 |
Apply EMA every N steps instead of every step. |
exp_manager.ema.cpu_offload=True |
Offload EMA weights to CPU. May introduce significant slow-downs. |