Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
NeMo Framework AutoConfigurator
Project Description
Learning Goals
AutoConfigurator searches for the Hyper-Parameters (HPs) that achieve the highest throughput for training and inference for Large Language Models (LLMs) using NeMo-Framework. AutoConfigurator is intended to quickly iterate over different model configurations, to find the best configuration with minimal time and money spending. The objective of this playbook is:
To learn how to use the AutoConfigurator to determine the optimal model size for given compute and training budget
To produce optimal foundation model pretraining and inference configurations to achieve the highest throughput runs.
NeMo Tools and Resources
Data
To run the Autoconfigurator, make sure that you have a small amount of representative data downloaded on your system. Datasets vary depending on whether you want to determine the optimal configuration for a GPT3 model, T5 or Bert model.
Dataset can be downloaded by following the dataset preparation steps laid out in the “Pretraining.rst” playbook.
As an example, if you want to determine the model size or the optimal model config for a GPT3 126M model, download the gpt3_pile dataset using the data_preparation stage laid out in the config.yaml file of the NeMo Launcher scripts (for details refer to the Pretraining.rst playbook)
Requirements
Playbook - NeMo Framework Model Pre-training
Model Pre-training using NeMo Framework Playbook for data preparation steps
Software
Access to NGC NeMo Framework Training Container
DGX SuperPOD SW Stack: Base Command Manager, Slurm (w/ pyxis plugin)
Hardware
Minimum 2 DGX A100 80GB nodes (16 GPUs) to be able to experiment multi-node runs of AutoConfigurator
In absence of 80GB, the user can also experiment with 40GB A100 nodes
Project Instructions and Milestones
This is the overall workflow of the AutoConfigurator:
The AutoConfigurator first recommends a model size and generates a base config. This base config file is a valid config file, though not the most optimal. After generating this base config file, the tool launches parallel grid search over key hyperparameters to determine the most optimal (i.e. highest throughput) configuration files. This process happens for training grid search as well as inference grid search based on your specifications in the config.yaml file.
Accordingly, in this playbook, we will perform 2 tasks: [1] Determine the optimal model size for a given compute and training budget and [2] Determine the optimal training and inference configuration to get the highest throughput models.
[1] Model Size Recommendation
If the user wants to know what model size they wish to train, AutoConfigurator is capable of recommending a model size, given the hardware and training time constraints. If the number of GPUs, the TFLOPS per GPU, the maximum time to train, and the number of tokens to train for are known, then the tool can recommend a model size that can be trained with the specified hardware and time constraints.
For example, if the user has 20 NVIDIA DGX nodes available (80GB GPU memory), and wants to train a GPT model for a maximum of 5 days, AutoConfigurator will recommend using a 5B parameter GPT model.
Perform the following steps to get the optimal recommended model size:
Step 1: Download the NeMo Framework Training container and Prepare the environment
Go to NeMo Framework Training | NVIDIA NGC to get the latest NeMo Framework training container.
The NeMo Framework codebase is included as part of the training container. To download the container and copy the contents to a local directory in the cluster, the following command can be executed:
Note: Specify the path to the local directory based on your setup and always use the latest container tags.:
srun -p <partition> -N 1 --container-mounts=/path/to/local/dir:/workspace/mount_dir --container-image=<container_tag> bash -c "cp -r /opt/NeMo-Framework-Launcher/launcher_scripts /opt/NeMo-Framework-Launcher/auto_configurator /workspace/mount_dir/"
After that, install the NeMo Framework scripts dependencies on the head node of the cluster::
pip install -r requirements.txt
Step 2: Setup the cluster config
Setup the cluster config: Go to
/auto_configurator/conf/cluster
Modify the
bcm.yaml
file to set the partition, account, and job_name_prefix for your cluster
Step 3: Modify config.yaml file
After setting the cluster configuration, go to
/auto_configurator/conf/config.yaml
The first parameter that must be set is the auto_configurator_path parameter inside the
conf/config.yaml
file. This parameter must point to the absolute path where the auto_configurator directory is stored in the file system.After this, set the
launcher_scripts_path
,fastertransformer_path
,base_results_dir
, anddata_dir
to point to the appropriate locationsNote that the dataset for GPT, T5 and mT5 will be different, so modify this parameter accordingly. Follow the data preparation steps from the Pretraining.rst playbook to learn how to download and preprocess the datasets for each model. The dataset in this path does not need to be the full size dataset; only a small representative sample of the dataset is needed, since AutoConfigurator does not train the models to convergence.
Make sure that the “training_container” points to the container that you downloaded (latest tag recommended)
If you want to visualize the results, enable the w&b logging by setting enable to True and entering the w&b api key in the specified field
To get the model size recommendation and the optimal model training config, set the “run_training_hp_search” to True and set the “run_inference_hp_search” to False.
For a model architecture of your choice (from GPT3, T5, mT5, and Bert), specify “search_config” in the “defaults” section at the top to gpt3/unknown_size (for a gpt3 model) or t5/unknown_size (for a T5 model)
Step 4: Modify the search configuration
After modifying the cluster config and the
config.yaml
file, go to/auto_configurator/conf/search_config
Within
/search_config
, go to the directory of the model architecture of your choice, for which you would like to determine the model_size.For example, go to
search_config/t5/unknown_size.yaml
Modify the
unknown_size.yaml
by specifying the following:Please set model_size_in_b to null
Num_nodes, gpus_per_node, gpu_memory_gb (currently 40GB and 80GB are supported),
max_training_days (the number of days that you can run the pre training for, for full convergence),
limit_search_runs (this parameter can be used to limit the number of configs that will be searched during the HP search stage. We recommend selecting a value between 30 and 100 for this parameter. AutoConfigurator will probably need to search at least 30 different configs to find the optimal one. However, if the computing resources are available in your cluster, we recommend increasing this parameter to a value close to 100. )
Output_top_n (this parameter can be used to configure how many files to output in the summary. By default, when set to 10, it will output the top 10 configurations.)
max_steps_per_run (this parameter indicates how many steps to train each configuration for)
Max_minutes_per_run (this parameter indicates how long to run each configuration for, in minutes. We recommend using at least 20 minutes per run for the smaller models, and increasing it to over 60 minutes for the larger models. The training run will be stopped when either max_steps_per_run or max_minutes_per_run is reached.)
The tflops_per_gpu (this parameter provides an estimate of the TFLOPs each GPU can achieve when training large language models with NeMo Framework. This value is only used to provide an estimate of how long the model will take to train to full convergence, so you can know the time to train before you even begin training your model. You can set it to 140 for A100s)
The num_tokens_in_b parameter indicates how many billions of tokens you will train your model for, when training to full convergence. It will be used when estimating how long it will take to train the model, to the desired number of tokens.
The vocab_size parameter must show the vocabulary size that will be used during training.
The logs parameter can be used to configure where the result logs will be saved. By default, this directory will be created inside the base_results_dir indicated in the conf/config.yaml file.
Finally, the tensor_parallel_sizes, pipeline_parallel_sizes, min_model_parallel_size, max_model_parallel_size, micro_batch_sizes, and act_ckpt_layers parameters can be used to override the heuristics that choose the grid search space and the maximum and minimum parallelism allowed for each model. If these are left as auto, AutoConfigurator will find the appropriate values.
Step 5: Run the pipeline
After setting the above config files: go to
/auto_configurator/
and run:python3 main.py
As soon as you run the above, you get the following message on the terminal along with all the search configs that will be launched in parallel to determine the best hyperparameters for this recommended model size. For example:
You can train a 2.42B parameter model in 50 days using 32 GPUs. This result assumes you are training to 1000B tokens, and each GPU achieves 140 TFLOPS.
The above steps will result in 2 actions: Model Size recommendation and the optimal hyperparameters for this model size. A successful run will result in a base config to be generated .
When AutoConfigurator generates the base configuration for a model, it will be saved inside the directory specified in the logs parameter in your config files. By default, this will be …/results/<model_name>/<model_size>_<gpu_mem_size>/. As the default search_config value is set to gpt3/5b and the default gpu_memory_gb is set to 80, the results can be found in the …/results/gpt3/5b_80gb/ directory. The base config will be available inside that directory, with the name base_cfg_<model_size>.yaml.
For our example, you can replace model_size with “unknown_size” to get the names of the output files.
[2] Getting the optimal hyperparameters for training and inference for a given model size and compute budget
If you already know the model size that you are interested in training, perform the following steps to get the optimal recommended hyperparameters.
Step 1: Same as above
Step 2: Same as above
Step 3: Modifying the config.yaml file
Here, follow the same steps as above but in the “defaults” section, in “search_config” specify the model size of your choice. For example: search_config: gpt3/5b for a 5B parameter GPT3 model
Specify all the other parameters and paths as explained in the previous section
Step 4: Modify the search configuration
After specifying the correct model size in
config.yaml
, go to/auto_configurator/conf/search_config
Within the search_config directory, go to the appropriate location based on your model choice. For example,
auto_configurator/conf/search_config/gpt3/5b.yaml
Within this yaml file, modify the parameters as explained in Step 4 in the previous section.
Step 5: Run the pipeline
After setting the above config files: go to
/auto_configurator/
and run:python3 main.py
Running this will launch a grid search over config files to determine the most optimal model config. The first action after running python3 main.py is the generation of a base config file.
When AutoConfigurator generates the base configuration for a model, it will be saved inside the directory specified in the logs parameter in your config files. By default, this will be
.../results/<model_name>/<model_size>_<gpu_mem_size>/
. As the default search_config value is set togpt3/5b
and the default gpu_memory_gb is set to 80, the results can be found in the.../results/gpt3/5b_80gb/
directory. The base config will be available inside that directory, with the namebase_cfg_<model_size>.yaml
.If the training HP search pipeline is run, the results will be in three different directories inside your logs directory. The
candidate_configs
directory contains all the YAML files with all the configurations generated by the HP search. Thetraining_logs
directory contains all the logs of training each of the individual configs AutoConfigurator generated. Iflimit_search_runs
was set to 30, then there should be 30 different directories with the logs for each model.Finally, after all the training runs have finished and the final run has analyzed the throughput of each configuration, the final model recommendation will be stored in the
final_results
directory. This directory will contain a log file which lists theoutput_top_n
fastest configs, sorted from fastest to slowest. The directory will also contain a csv file with all the results from every config that was run with AutoConfigurator for a given model size. The results will be sorted from highest throughput to slowest throughput. The CSV file also includes information such as the samples per second achieved by each model, the time per global step, the TFLOPS per GPU achieved, and so on. The final_results directory will also contain a YAML file, which corresponds to the config with the lowest training time. This is the recommended model for training.Logging with Weights and Biases: Weights and Biases (W&B) can be used to log all the training search runs. To achieve this, the wandb parameters must be modified in the
conf/config.yaml
file. First, enable must be set to True. Then, the api_key_file must be set to point to the path where the file contains the W&B API key. The API key must be in the first line of that file. Finally, the project parameter must have the name of the W&B project where the metrics will be stored. The name of each run does not need to be provided. It will be automatically generated by AutoConfigurator, using the model name, model size, and hyper-parameters used for each specific run.
For inference HP Search
To run the inference HP search pipeline, the parameter run_inference_hp_search must be set to True in the
conf/config.yaml
file. The model used to search the best inference HPs must be selected using the search_config parameter inconf/config.yaml
. For example, by default, this parameter will be set to gpt3/5b, so AutoConfigurator will search the optimal inference HPs for a 5B parameter GPT model. The configuration for this model can be found in theconf/search_config/gpt3/5b.yaml
file. To configure the behavior of the HP search, the corresponding parameters can be modified in the corresponding YAML file.For the inference HP search, the results can be found inside the directory specified in the
results_dir
parameter of the YAML config file. Inside that directory, you will find:.../inference/final_summary/final_output.csv
. This csv file will have the results of every model that was run by the AutoConfigurator HP search.
Project Deliverables
Submit a screenshot of the logs directory for the results files including candidate configs as well as training logs
Submit a screenshot of the final_results directory showing the log file listing the output_top_n fastest configs as well as the final resultant CSV file
Project Evaluation
For this playbook, you will be evaluated on the following:
The ability to get a model size recommendation for a given training time and compute budget
The ability to produce optimal model configuration files for pretraining a model of given size for a specific number of nodes
The ability to produce optimal model configuration files for inference hyperparameter search for a given model size and number of nodes