Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Training with Predefined Configurations
We have curated 3 configurations with suggested hyperparameters specifically for the NVIDIA DGX SuperPOD, which is
equipped with 8 NVIDIA A100 80GB GPUs. The configurations for the curated models can be found in
the conf/training/clip
directory. You can access and modify the parameters to adjust the hyperparameters for your
specific training runs. By customizing these settings, you can tailor the model’s performance and training efficiency to
better suit your needs and requirements.
Model |
Image size |
Text Model size (M) |
Image Model size (M) |
Output dim |
Batch Size per GPU |
Accumulated Global Batch Size |
Precision |
AMP Level |
Total Training Samples Seen |
---|---|---|---|---|---|---|---|---|---|
ViT B/32 |
224 |
63 |
87 |
512 |
500 |
32000 |
BF16 |
O2 |
12B |
ViT L/14 |
224 |
123 |
303 |
768 |
112 |
32256 |
BF16 |
O2 |
12B |
ViT H/14 |
224 |
354 |
638 |
1024 |
80 |
32000 |
BF16 |
O2 |
12B |
To enable the training stage with a CLIP model, configure the configuration files:
In the
defaults
section ofconf/config.yaml
, update thetraining
field to point to the desired CLIP configuration file. For example, if you want to use theViT B/32
(i.e.vit_B_32
), change thetraining
field toclip/vit_B_32
.defaults: - _self_ - cluster: bcm - data_preparation: multimodal/download_multimodal - training: clip/vit_B_32 ...
In the
stages
field ofconf/config.yaml
, make sure the training stage is included. For example,stages: - data_preparation - training ...
Execute the launcher pipeline:
python3 main.py
.
Remarks:
NeMo CLIP does not yet support gradient accumulation. Therefore, please ensure
micro_batch_size * num_gpus = global_batch_size
(i.e. gradient accumulation step is 1).For CLIP models, you can enable Exponential Moving Average (EMA) by setting
training.exp_manager.ema.enable=True
. However, EMA is currently not compatible with AMP O2. To use EMA, you must disable AMP O2 by settingtraining.model.megatron_amp_O2=False
. Enabling EMA can help your model converge faster, but be aware that it may result in a slight performance penalty.