Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.
Use Auto Configurator to Find the Optimal Configuration
Auto Configurator searches for hyperparameters (HPs) that achieve the maximum highest training throughput when working with Large Language Models (LLMs) utilizing the NeMo Framework.
Note
Auto Configurator is only supported now for GPT-based models: GPT3, LLama, Mixtral, and Mistral.
Auto Configurator Capabilities
Auto Configurator is intended to iterate over different model configurations quickly and find the best configuration, that is, the configuration that minimizes both time and financial expenditure. It offers a range of features to facilitate this, as detailed in the list below.
Model size recommendation: finds the optimal model size if the parameter is not specified.
Training time estimation: estimates model training time based on input parameters.
Base configuration generation: returns a basic model configuration.
Hyperparameters recommendation: finds the optimal list of hyperparameters to be trained.
Optimal configuration recommendation: calculates the performance after a short training of candidate configurations and finds the optimal model configuration.
Model Size Recommendation
If you have not decided what model size you want to train, Auto Configurator can recommend a model size for your use case. If you know the number of GPUs, TFLOPS per GPU, the maximum time to train, and the number of tokens to train for, it can recommend a model size that can be trained with the specified hardware and time constraints.
For example, if you had 20 NVIDIA DGX nodes available (in 80 GB GPU memory), and wanted to train a GPT model for a maximum of 5 days, Auto Configurator would recommend using a 5B parameter GPT model.
Training Time Estimation
Auto Configurator calculates the estimated training time for your model. It provides a projection of the training time in days, based on the input dataset and parameters you provide.
Base Configuration Generation
When you provide the model size, or Auto Configurator has suggested one, it generates a base configuration for the target model. The base configuration is a valid configuration in NeMo 2.0 format. The optimization of throughput, however, is conducted in the next step, Training Auto Configurator HP Search.
Hyperparameters Recommendation
After Auto Configurator generates the base configuration, it searches over four critical hyperparameters that have a great impact on training throughput but do not affect model convergence. These hyperparameters include Tensor Parallelism (TP), Pipeline Parallelism (PP), Context Parallelism (CP), Expert Parallelism (EP), Micro Batch Size (MBS), and Activation Checkpointing Layers (ActCkpt). Auto Configurator will also provide optimal Global Batch Size (GBS) if it’s not specified.
Auto Configurator initially applies heuristics to identify suitable candidates for the four key parameters, subsequently generating a grid of candidate configurations. It returns all of the candidate configurations in NeMo 2.0 format.
Note
Some of the candidate configurations may not work due to high-memory usage or other issues.
Once the candidate configurations are generated, you can use NeMo Framework to launch the most promising candidates.
When running the candidates on the cluster, you can limit job time and job max steps by using max_minutes_per_run
and max_steps_per_run
parameters. During this search, the jobs will run with the number of nodes specified in the configuration files, using the num_nodes
parameter. Once all of the jobs have finished running, you’ll need to run compare_throughput.py to get a .csv table with performance results for each succeeded job.
Optimal Configuration Recommendation
After all of the candidate jobs are done, Auto Configurator calculates performance parameters for each of the candidates. Auto Configurator generates two .csv files: one detailing the performance measures of the candidates and another listing the candidates that failed due to out-of-memory errors.
End-To-End Example
The following list shows the required input parameters for the Auto Configurator runner:
model_type
: should be a value from the list["gpt3", "llama", "mixtral", "mistral"]
.model_version
: the version of the model to be used in the search. 2 if Llama2, 3 if Llama3, 3 for GPT-3 (is set by default).num_nodes
: number of nodes to be used for the training.seq_length
: sequence length to be used for the training.data_path
: dataset to be used for the training.tokenizer_path
: path to tokenizer model if custom tokenizer will be used.
The following list shows the optional parameters for the Auto Configurator runner:
model_size
: size of the model to be trained.model_measure
:"M"
if model_size is specified in millions."B"
(is set by default) if in billions.global_batch_size
: global batch size to be used.tensor_parallel_sizes
: a list, such as[1, 2, 4]
.pipeline_parallel_sizes
: a list, such as[1, 2, 4]
.context_parallel_sizes
: a list, such as[1, 2, 4]
.expert_parallel_sizes
: a list, such as[1, 2, 4]
.micro_batch_sizes
: a list, such as[1, 2, 4]
.min_model_parallel_size
: a value for the minimum desired parallelism.max_model_parallel_size
: a value for the maximum desired parallelism.
For each of the optional parameters, Auto Configurator will find the optimal value if the parameter is not specified. To view the full list of parameters, please visit this page.
To view an end-to-end example of how to generate candidate configs, train them, and calculate the performance using Auto Configurator with NeMo Framework, please visit this page.