Training with Predefined Configurations

NVIDIA Docs Hub NVIDIA NeMo Framework User Guide Training with Predefined Configurations

NVIDIA provides configuration for the Mixtral (8x7b v0.1) model. The configuration include carefully selected hyperparameters, which you may use as guidelines for any custom model configurations.

Run Training

To run Mixtral training update conf/config.yaml:

Copy
Copied!

            
            defaults:
  - training: mixtral/mixtral

stages:
  - training

Execute launcher pipeline: python3 main.py

Default configurations for model size specific training can be found in the folder conf/training/mixtral. The configuration is divided into four sections run, trainer, exp_manager, and model.

Copy
Copied!

            
            run:
  name: Mixtral-8x7b
  results_dir: ${base_results_dir}/${.name}
  time_limit: "0-04:00:00"
  dependency: "singleton"

Set the number of nodes and devices for training:

Copy
Copied!

            
            trainer:
  num_nodes: 16
  devices: 8
  max_steps: 300000 # consumed_samples = global_step * global_batch_size
  max_time: "05:23:30:00" # days:hours:minutes:seconds

Set configurations for creating a checkpoint:

Copy
Copied!

            
            exp_manger:
  create_checkpoint_callback: True
  checkpoint_callback_params:
    monitor: val_loss
    save_top_k: 10
    mode: min
    always_save_nemo: False # saves nemo file during validation, not implemented for model parallel
    save_nemo_train_end: False # not recommended when training large models on clusters with short time limits
    filename: 'megatron_Mixtral--{val_loss:.2f}-{step}-{consumed_amples}'
    model_parallel_size: ${multiply:${training.model.tensor_model_parallel_size}, ${training.model.pipeline_model_parallel_size}}

Set wandb configurations:

Copy
Copied!

            
            exp_manager:
  create_wandb_logger: True
  wandb_logger_kwargs:
    project: nemo_Mixtral
    name: ${training.run.name}

Set tensor parallel and pipeline parallel size:

Copy
Copied!

            
            model:
  tensor_model_parallel_size: 8
  pipeline_model_parallel_size: 1

Set data distribution configuration:

Copy
Copied!

            
            model:
  data:
    data_prefix:
    - .0333
    - ${data_dir}/my-Mixtral_00_text_document
    - .0333
    - ${data_dir}/my-Mixtral_00_text_document

Gated Model assets

Mistral’s tokenizer is hosted on Huggingface.com which requires login. In order to access the tokenizer assets, users are advised to prepend the HF_TOKEN=<token> environment variable to the nemo launcher invocation command.

In NeMo Laucher this can be achieved by appending “++env_vars.HF_TOKEN=<user-token” to the argument list.

Previous Data Preparation

Next Checkpoint Conversion