Generalized PEFT Framework

NVIDIA Docs Hub NVIDIA NeMo Framework User Guide Generalized PEFT Framework

Many Parameter Efficient Fine-Tuning (PEFT) models have overlapping functionalities. In order to enhance NeMo’s codebase, we have worked towards unifying the implementation of all supported PEFT methods, making it more streamlined. Furthermore, we have introduced the Low-rank Adapter PEFT model for GPT-style and mT5/T5-style base models in NeMo.

The Generalized PEFT Framework also includes functionality for Prompt Learning and Adapter Learning. You can find further details for these techniques below.

Model Prompt Learning

In the NeMo framework, P‑tuning and prompt tuning methods are collectively known as prompt learning. Both methods are parameter-efficient alternatives to fine-tuning pretrained language models. The NVIDIA NeMo implementation lets you use one pretrained GPT, T5, or mT5 model on many downstream tasks without needing to tune the model’s full set of parameters. It also lets you add new tasks to your model without overwriting or disrupting previous tasks for which the model has already been p-tuned or prompt-tuned. Because neither method alters the original model parameters, P‑tuning and prompt-tuning also avoid the cartographic forgetting issues often encountered when fine-tuning models.

Instead of selecting discrete text prompts in a manual or automated fashion, P‑tuning and prompt tuning utilize virtual prompt embeddings that can be optimized via gradient decent. The only difference between prompt tuning and P‑tuning in NeMo-Megatron is the architecture used to tune the soft prompt tokens during training.

The NVIDIA P‑tuning implementation is based on Liu et al’s paper GPT Understands, Too.
The prompt tuning implementation is based on Lester et. al’s EMNLP 2021 paper The Power of Scale for Parameter-Efficient prompt tuning.”

For more details about these implementations, see Prompt Learning in the NeMo framework documentation.

Model Adapter Learning and IA3 Learning

The NeMo framework supports Adapter Learning and Infused Adapter by Inhibiting and Amplifying Inner Activations (IA3) learning. Both methods are parameter-efficient alternatives to fine-tuning pretrained language models. The NVIDIA NeMo implementation lets you use one pretrained GPT or T5 models on many downstream tasks without tuning the model’s full set of parameters. Because neither method modifies the original model parameters, they also avoid the cartographic forgetting issues often encountered when fine-tuning models.

Unlike P‑tuning and prompt-tuning, Adapter Learning and IA3 do not insert virtual prompts into the input. Adapter Learning introduces feedforward layers within the core transformer architecture which are updated for specific downstream tasks. IA3 adds even fewer parameters; they simply scale the hidden representations in the transformer layer. These parameters can be trained for specific downstream tasks.

The NVIDIA implementation of Adapter Learning for GPT3 and T5 is based on Parameter-Efficient Transfer Learning for NLP.
The NVIDIA implementation of IA3 is based on Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning.

Note that the paper proposes a recipe called t-few which introduces an unlikelihood loss function and a continued training procedure. The NVIDIA IA3 implementation does not support these additions, and only focuses on the core architectural change.

Previous Resuming Training with a Different Number of Nodes

Next Exporting NeMo Models to TensorRT-LLM