Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Training with Predefined Configurations

NVIDIA provides configurations for the Mixtral model series, available in sizes 8x3B, 8x7B and 8x22B. These configurations include carefully selected hyperparameters, which can serve as guidelines for customizing your own model configurations. All pre-defined training configs are located at NeMo-Framework-Launcher training configs: Mixtral Training Config

Run Training

  1. To run Mixtral training update conf/config.yaml:

defaults:
  - training: mixtral/mixtral

stages:
  - training

2. Specify the model size you want for training configuration: use mixtral/mixtral_8x3b for the 8x3B model, the mixtral/mixtral_8x7b for the 8x7B model or mixtral/mixtral_8x22b for the 8x22B model.

  1. Execute the launcher pipeline: python3 main.py.

Configure the Model

Default configurations for model size specific training can be found in the folder conf/training/mixtral. The configuration is divided into four sections run, trainer, exp_manager, and model.

  1. Configure the run:

run:
  name: Mixtral-8x7b
  results_dir: ${base_results_dir}/${.name}
  time_limit: "0-04:00:00"
  dependency: "singleton"
  1. Set the number of nodes and devices for training:

trainer:
  num_nodes: 16
  devices: 8
  max_steps: 300000 # consumed_samples = global_step * global_batch_size
  max_time: "05:23:30:00" # days:hours:minutes:seconds
  1. Set configurations for creating a checkpoint:

exp_manger:
  create_checkpoint_callback: True
  checkpoint_callback_params:
    monitor: val_loss
    save_top_k: 10
    mode: min
    always_save_nemo: False # saves nemo file during validation, not implemented for model parallel
    save_nemo_train_end: False # not recommended when training large models on clusters with short time limits
    filename: 'megatron_Mixtral--{val_loss:.2f}-{step}-{consumed_amples}'
    model_parallel_size: ${multiply:${training.model.tensor_model_parallel_size}, ${training.model.pipeline_model_parallel_size}}
  1. Set wandb configurations:

exp_manager:
  create_wandb_logger: True
  wandb_logger_kwargs:
    project: nemo_Mixtral
    name: ${training.run.name}
  1. Set tensor parallel and pipeline parallel size:

model:
  tensor_model_parallel_size: 8
  pipeline_model_parallel_size: 1
  1. Set data distribution configuration:

model:
  data:
    data_prefix:
    - .0333
    - ${data_dir}/my-Mixtral_00_text_document
    - .0333
    - ${data_dir}/my-Mixtral_00_text_document

Request Gated Model Assets

Mistral’s tokenizer is hosted on Huggingface.com which requires a login. To access the tokenizer assets, you need to prepend the HF_TOKEN= environment variable to the NeMo Launcher invocation command. In NeMo Launcher, this can be achieved by appending “++env_vars.HF_TOKEN=<user-token” to the argument list.