Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Training with Predefined Configurations
NVIDIA offers three different configurations, each accompanied by recommended hyperparameters, specifically designed for the NVIDIA DGX SuperPOD. This infrastructure is equipped with eight NVIDIA A100 80GB GPUs. The configuration details for the curated models can be found in the conf/training/video_neva
directory.
You can access, modify, and fine-tune the hyperparameters for your specific training runs. By customizing these settings, you can optimize the model’s performance and training efficiency to better align with your requirements.
Language Model |
Vision Encoder |
Multimodal Connector Type |
Tensor Model Parallel Size |
Pipeline Model Parallel Size |
Batch size per GPU |
Accumulated Global Batch Size |
Precision |
AMP Level |
Total Training Samples Seen |
---|---|---|---|---|---|---|---|---|---|
LLaMA-2-7B-Chat (frozen) |
CLIP-L-336px (frozen) |
MLP Layers (trainable) |
4 |
1 |
8 |
256 |
BF16 |
O2 |
550K |
LLaMA-2-13B-Chat (frozen) |
CLIP-L-336px (frozen) |
MLP Layers (trainable) |
8 |
1 |
8 |
256 |
BF16 |
O2 |
550K |
LLaMA-2-70B-Chat (frozen) |
CLIP-L-336px (frozen) |
MLP Layers (trainable) |
8 |
1 |
2 |
256 |
BF16 |
O2 |
550K |
To enable the training stage using a VideoNeVA model, follow these configuration steps.
Navigate to the
defaults
section inconf/config.yaml
.Update the
training
field to reference the desired ViT configuration file. For instance, if you wish to utilize theLLaMA-2-7B-Chat
(i.e.,llama2_7b_chat
) configuration, modify thetraining
field tovideo_neva/llama2_7b_chat
.defaults: - _self_ - cluster: bcm - data_preparation: null - training: video_neva/llama2_7b_chat ...
Within the
stages
field ofconf/config.yaml
, ensure the training stage is listed.stages: - training ...
Execute the launcher pipeline:
python3 main.py
.
Remarks:
Prior to initiating your training, ensure you’ve readied all necessary datasets and checkpoints.
Before starting the training, set the correct path for the dataset and checkpoints in
video_neva/llama2_{model_size}_chat.yaml
.If you’re training with the Vicuna v1.5 language model checkpoints, you can adopt the same model size configuration as used in Llama2 Chat, as they share a similar structure. For example, when working with the Vicuna v1.5 7B model, you can conveniently choose the
llama2_7b_chat
configuration. The only adjustment needed is to set the following parameter:training.model.mm_cfg.llm.model_type=v1
.