Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Fine-Tuning

The NeMo Framework offers multiple specially curated configurations, each with a set of suggested hyperparameters designed for the NVIDIA DGX SuperPOD. This system comes equipped with eight NVIDIA A100 80GB GPUs. The configurations for the curated models can be found in the conf/fine_tuning/neva directory. You can access and modify the parameters to adjust the hyperparameters for your specific training runs. By customizing these settings, you can tailor the model’s performance and training efficiency to better suit your needs and requirements.

Language Model

Vision Encoder

Multimodal Connector Type

Tensor Model Parallel Size

Pipeline Model Parallel Size

Batch size per GPU

Accumulated Global Batch Size

Precision

AMP Level

Total Training Samples Seen

LLaMA-2-7B-Chat (trainable)

CLIP-L-336px (frozen)

MLP Layers (trainable)

4

1

4

128

BF16

O2

150K

LLaMA-2-13B-Chat (trainable)

CLIP-L-336px (frozen)

MLP Layers (trainable)

8

1

4

128

BF16

O2

150K

LLaMA-3-8B-Chat (trainable)

CLIP-L-336px (frozen)

MLP Layers (trainable)

4

1

2

128

BF16

O2

150K

LLaMA-3-70B-Chat (trainable)

CLIP-L-336px (frozen)

MLP Layers (trainable)

8

8

2

128

BF16

O2

150K

Mistral-7b-Instruct-v0.1 (trainable)

CLIP-L-336px (frozen)

MLP Downsample (trainable)

4

1

4

128

BF16

O2

150K

Mixtral-8x7b-Instruct-v0.1 (trainable)

CLIP-L-336px (frozen)

MLP Downsample (trainable)

8

2

2

128

BF16

O2

150K

Enable Fine-Tuning

To enable fine-tuning with a NeVA model, follow these configuration steps.

  1. In the defaults section of conf/config.yaml, update the fine_tuning field to point to the ViT configuration file you want. For example, if you want to fine-tune a pretrained NeVA model based on LLaMA-2-7B-Chat (i.e. llama2_7b_chat) configuration, change the fine_tuning field to neva/llama2_7b_chat.

    defaults:
      - fine_tuning: neva/llama2_7b_chat
      ...
    
  2. In the stages field of conf/config.yaml, make sure the fine_tuning stage is included. For example,

    stages:
      - fine_tuning
      ...
    
  3. Execute the launcher pipeline: python3 main.py.

Additional Guidelines for Fine-Tuning

  1. Prior to initiating your fine-tuning, ensure you’ve readied all necessary datasets and checkpoints.

  2. To load a pretrained checkpoint for fine-tuning, set the restore_from_path field in the model section to the path of the pretrained checkpoint in .nemo format. By default, this field links to the .nemo format checkpoint located in the training checkpoints folder.

  3. If you are training using the Vicuna v1.5 language model checkpoints, you can utilize the same model size configuration as in Llama2 Chat, since they are structurally identical. For instance, when using the Vicuna v1.5 7B model, you can simply opt for the llama2_7b_chat configuration. You only need to set the following: fine_tuning.model.mm_cfg.llm.model_type=v1 and fine_tuning.model.data.conv_template=v1.

  4. For sequence packing, refer to the documentation at NeVA Sequence Packing.

  5. When employing pipeline parallelism, the vision encoder (if loaded from Hugging Face) will duplicate on GPUs where pipeline parallelism rank equals 0.

    Loading ViT from HF
    DP0
       PP rank 0
          TP rank 0 (if HF, ViT)
          TP rank 1 (if HF, ViT)
       PP rank 1
          TP rank 0
          TP rank 1
    DP1
       PP rank 0
          TP rank 0 (if HF, ViT)
          TP rank 1 (if HF, ViT)
       PP rank 1
          TP rank 0
          TP rank 1
    
    Loading ViT from .nemo
    DP0
       PP rank 0
       TP rank 0 (if NeMo, ViT TP rank 0)
          TP rank 1 (if NeMo, ViT TP rank 1)
       PP rank 1
       TP rank 0
          TP rank 1
    DP1
       PP rank 0
       TP rank 0 (if NeMo, ViT TP rank 0)
          TP rank 1 (if NeMo, ViT TP rank 1)
       PP rank 1
       TP rank 0
          TP rank 1
    
  6. Recommended FP8 recipe:

    fine_tuning.model.fp8=True \
    fine_tuning.model.fp8_e4m3=False \
    fine_tuning.model.fp8_hybrid=True \
    fine_tuning.model.fp8_margin=0 \
    fine_tuning.model.fp8_interval=1 \
    fine_tuning.model.fp8_amax_history_len=1024 \
    fine_tuning.model.fp8_amax_compute_algo=max