Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Parameter-Efficient Fine-Tuning (PEFT)#
The NeMo Framework offers multiple specially curated configurations, each with a set of suggested hyperparameters designed for the NVIDIA DGX SuperPOD. This system comes equipped with eight NVIDIA A100 80GB GPUs. The configurations for the curated models can be found in the conf/peft/neva
directory. You can access and modify the parameters to adjust the hyperparameters for your specific training runs. By
customizing these settings, you can tailor the model’s performance and training efficiency to better suit your needs and
requirements.
Language Model |
Vision Encoder |
Multimodal Connector Type |
PEFT Scheme |
Tensor Model Parallel Size |
Pipeline Model Parallel Size |
Batch size per GPU |
Accumulated Global Batch Size |
Precision |
AMP Level |
Total Training Samples Seen |
|---|---|---|---|---|---|---|---|---|---|---|
LLaMA-2-7B-Chat (frozen) |
CLIP-L-336px (frozen) |
MLP Layers (trainable) |
LORA |
4 |
1 |
4 |
128 |
BF16 |
O2 |
150K |
LLaMA-2-13B-Chat (frozen) |
CLIP-L-336px (frozen) |
MLP Layers (trainable) |
LORA |
8 |
1 |
4 |
128 |
BF16 |
O2 |
150K |
LLaMA-2-70B-Chat (frozen) |
CLIP-L-336px (frozen) |
MLP Layers (trainable) |
LORA |
8 |
1 |
1 |
128 |
BF16 |
O2 |
150K |
LLaMA-3-8B-Chat (frozen) |
CLIP-L-336px (frozen) |
MLP Layers (trainable) |
LORA |
8 |
1 |
4 |
128 |
BF16 |
O2 |
150K |
LLaMA-3-70B-Chat (frozen) |
CLIP-L-336px (frozen) |
MLP Layers (trainable) |
LORA |
8 |
1 |
1 |
128 |
BF16 |
O2 |
150K |
Mistral-7b-Instruct-v0.1 (frozen) |
CLIP-L-336px (frozen) |
MLP Downsample (trainable) |
LORA |
4 |
1 |
4 |
128 |
BF16 |
O2 |
150K |
Mixtral-8x7b-Instruct-v0.1 (frozen) |
CLIP-L-336px (frozen) |
MLP Downsample (trainable) |
LORA |
8 |
1 |
2 |
128 |
BF16 |
O2 |
150K |
Enable Parameter-Efficient Fine-Tuning#
To enable the PEFT stage with a NeVA model, configure the configuration files:
In the
defaultssection ofconf/config.yaml, update thepeftfield to point to the NeVA configuration file you want. For example, if you want to fine-tune a pretrained NeVA model based onLLaMA-2-7B-Chat(i.e.llama2_7b_chat) configuration, change thepeftfield toneva/llama2_7b_chat.defaults: - peft: neva/llama2_7b_chat ...
In the
stagesfield ofconf/config.yaml, make sure thepeftstage is included. For example,stages: - peft ...
Execute the launcher pipeline:
python3 main.py.
Additional Guidelines for Parameter-Efficient Fine-Tuning#
Prior to initiating your PEFT, ensure you’ve readied all necessary datasets and checkpoints.
To load a pretrained checkpoint for PEFT, set the
restore_from_pathfield in themodelsection to the path of the pretrained checkpoint in.nemoformat. By default, this field links to the.nemoformat checkpoint located in the training checkpoints folder.PEFT-tuned checkpoints will save only the LoRA weights instead of the entire model. For subsequent inference and evaluation, both sets of weights will be required.
If you are training using the Vicuna v1.5 language model checkpoints, you can utilize the same model size configuration as in Llama2 Chat, since they are structurally identical. For instance, when using the Vicuna v1.5 7B model, you can simply opt for the
llama2_7b_chatconfiguration. You only need to set the following:peft.model.mm_cfg.llm.model_type=v1andpeft.model.data.conv_template=v1.