Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Launcher Multirun

There are scenarios where you might want to run the NeMo Launcher with multiple different configurations, such as conducting a performance test across various sets of hyperparameters.

The NeMo launcher integrates the multirun feature from Hydra. This feature allows you to sweep over a range of parameters or perform grid searches efficiently. To utilize this functionality, use the --multirun (or -m) option from the command line. Below is an example demonstrating how to execute a multirun with different configurations:

python3 main.py -m \
    stages=[training] \
    training.trainer.num_nodes=6 \
    training.run.name="5b_6nodes_tp_\${training.model.tensor_model_parallel_size}" \
    training.model.tensor_model_parallel_size=1,2,4,8