Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Training with Predefined Configurations
Currently, NVIDIA offers one configuration for a RETRO model size of 300M. This configuration includes carefully selected hyperparameters, which you can use as a guideline for any custom model configurations.
The training configurations are defined by yaml files in the directory conf/training/retro
. You can choose a configuration by setting the value of the training
variable in conf/config.yaml
to the corresponding training yaml file. For example, to train the 300m parameters RETRO model, set the training
variable to retro/300m
, which points the to the conf/training/retro/300m
.yaml file.
300M Model Configuration
The 300M model uses the bf16
data type. It consists of 12 transformer layers for the decoder and 2 transformer layers for the encoder. The model has a hidden size of 768, a feedforward network size of 3072, and 12 attention heads with a GeLU activation function.
The sample sequence length is 2048, the chunk length is 64, and the optimizer used is Distributed Adam. This model does not employ any model parallelism. For more parameter details, refer to the configuration file retro/300m.yaml
.
The following section provides instructions for running training on a Slurm cluster.
Slurm
To train a 300m model on a Slurm cluster, modify the
conf/config.yaml
file to set:
training: retro/300m
stages:
- training
launcher_scripts_path:/path/to/launcher_scripts
data_dir:/path/to/retro_workdir
base_results_dir:/path/to/results
With /path/to/retro_workdir
points to directory containing the preprocessed pretraining data, and /path/to/results
points to directory to save the checkpoint.
Then, run the following command. All the configurations will be read from conf/config.yaml
and conf/training/retro/300m.yaml
:
python3 main.py