Parameter-Efficient Fine-Tuning (PEFT)#
PEFT is a popular technique used to efficiently finetune large language models for use in various downstream tasks. When finetuning with PEFT, the base model weights are frozen, and a few trainable adapter modules are injected into the model, resulting in a very small number (<< 1%) of trainble weights. With carefully chosen adapter modules and injection points, PEFT achieves comparable performance to full finetuning at a fraction of the computational and storage costs.
NeMo supports four PEFT methods which can be used with various transformer-based models.
GPT 3 |
NvGPT |
LLaMa 1/2 |
T5 |
|
---|---|---|---|---|
Adapters (Canonical) |
✅ |
✅ |
✅ |
✅ |
LoRA |
✅ |
✅ |
✅ |
✅ |
IA3 |
✅ |
✅ |
✅ |
✅ |
P-Tuning |
✅ |
✅ |
✅ |
✅ |
Learn more about PEFT in NeMo with the Quick Start Guide which provides an overview on how PEFT works in NeMo. Read about the supported PEFT methods here. For a practical example, take a look at the Step-by-step Guide.
The API guide can be found here