Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Training with Predefined Configurations
NVIDIA provides configurations for the Mistral (7B v0.1) model. The configuration includes carefully selected hyperparameters, which you may use as guidelines for any custom model configurations.
Run Training
To run Mistral training update conf/config.yaml
:
defaults:
- training: mistral/mistral_7b
stages:
- training
Execute the launcher pipeline: python3 main.py
.
Configuration
Default configurations for model size specific training can be found in the folder conf/training/mistral
.
The configuration is divided into four sections run
, trainer
, exp_manager
, and model
.
run:
name: Mistral_7b
results_dir: ${base_results_dir}/${.name}
time_limit: "0-04:00:00"
dependency: "singleton"
Set the number of nodes and devices for training:
trainer:
num_nodes: 16
devices: 8
max_steps: 300000 # consumed_samples = global_step * global_batch_size
max_time: "05:23:30:00" # days:hours:minutes:seconds
Set configurations for creating a checkpoint:
exp_manger:
create_checkpoint_callback: True
checkpoint_callback_params:
monitor: val_loss
save_top_k: 10
mode: min
always_save_nemo: False # saves nemo file during validation, not implemented for model parallel
save_nemo_train_end: False # not recommended when training large models on clusters with short time limits
filename: 'megatron_mistral--{val_loss:.2f}-{step}-{consumed_amples}'
model_parallel_size: ${multiply:${training.model.tensor_model_parallel_size}, ${training.model.pipeline_model_parallel_size}}
Set wandb configurations:
exp_manager:
create_wandb_logger: True
wandb_logger_kwargs:
project: nemo_mistral
name: ${training.run.name}
Set tensor parallel and pipeline parallel size:
model:
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
Set data distribution configuration:
model:
data:
data_prefix:
- .0333
- ${data_dir}/my-Mistral_00_text_document
- .0333
- ${data_dir}/my-Mistral_00_text_document
Gated Model assets
Mistral’s tokenizer is hosted on Huggingface.com which requires login. In order to access the tokenizer assets, users are advised to prepend the HF_TOKEN=<token> environment variable to the nemo launcher invocation command.
In NeMo Laucher this can be achieved by appending “++env_vars.HF_TOKEN=<user-token” to the argument list.