Generalized PEFT Framework

Many Parameter Efficient Fine-Tuning (PEFT) models have overlapping functionalities. In order to enhance NeMo’s codebase, we have worked towards unifying the implementation of all supported PEFT methods, making it more streamlined. Furthermore, we have introduced the Low-rank Adapter PEFT model for GPT-style and mT5/T5-style base models in NeMo.

The Generalized PEFT Framework also includes functionality for Prompt Learning and Adapter Learning. You can find further details for these techniques below.

In the NeMo framework, P‑tuning and prompt tuning methods are collectively known as prompt learning. Both methods are parameter-efficient alternatives to fine-tuning pretrained language models. The NVIDIA NeMo implementation lets you use one pretrained GPT, T5, or mT5 model on many downstream tasks without needing to tune the model’s full set of parameters. It also lets you add new tasks to your model without overwriting or disrupting previous tasks for which the model has already been p-tuned or prompt-tuned. Because neither method alters the original model parameters, P‑tuning and prompt-tuning also avoid the cartographic forgetting issues often encountered when fine-tuning models.

Instead of selecting discrete text prompts in a manual or automated fashion, P‑tuning and prompt tuning utilize virtual prompt embeddings that can be optimized via gradient decent. The only difference between prompt tuning and P‑tuning in NeMo-Megatron is the architecture used to tune the soft prompt tokens during training.

For more details about these implementations, see Prompt Learning in the NeMo framework documentation.

The NeMo framework supports Adapter Learning and Infused Adapter by Inhibiting and Amplifying Inner Activations (IA3) learning. Both methods are parameter-efficient alternatives to fine-tuning pretrained language models. The NVIDIA NeMo implementation lets you use one pretrained GPT or T5 models on many downstream tasks without tuning the model’s full set of parameters. Because neither method modifies the original model parameters, they also avoid the cartographic forgetting issues often encountered when fine-tuning models.

Unlike P‑tuning and prompt-tuning, Adapter Learning and IA3 do not insert virtual prompts into the input. Adapter Learning introduces feedforward layers within the core transformer architecture which are updated for specific downstream tasks. IA3 adds even fewer parameters; they simply scale the hidden representations in the transformer layer. These parameters can be trained for specific downstream tasks.

Note that the paper proposes a recipe called t-few which introduces an unlikelihood loss function and a continued training procedure. The NVIDIA IA3 implementation does not support these additions, and only focuses on the core architectural change.

Previous Resuming Training with a Different Number of Nodes
Next Exporting NeMo Models to TensorRT-LLM
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.