Launcher Multirun

There are scenarios where you might want to run the NeMo Launcher with multiple different configurations, such as conducting a performance test across various sets of hyperparameters.

The NeMo launcher integrates the multirun feature from Hydra. This feature allows you to sweep over a range of parameters or perform grid searches efficiently. To utilize this functionality, use the --multirun (or -m) option from the command line. Below is an example demonstrating how to execute a multirun with different configurations:

python3 main.py -m \
    stages=[training] \
    training.trainer.num_nodes=6 \
    training.run.name="5b_6nodes_tp_\${training.model.tensor_model_parallel_size}" \
    training.model.tensor_model_parallel_size=1,2,4,8