SFT and PEFT

Customizing models enables you to adapt a general pre-trained LLM to a specific use case or domain. This process results in a fine-tuned model that benefits from the extensive pretraining data, while also yielding more accurate outputs for the specific downstream task. Model customization is achieved through supervised fine-tuning and falls into two popular categories:

  • Full-parameter fine-tuning, which is referred to as supervised finetuning (SFT) in NeMo

  • Parameter-efficient fine-tuning (PEFT)

In SFT, all of the model parameters are updated to produce outputs that are adapted to the task.

PEFT, on the other hand, tunes a much smaller number of parameters which are inserted into the base model at strategic locations. When fine-tuning with PEFT, the base model weights remain frozen, and only the adapter modules are trained. As a result, the number of trainable parameters is significantly reduced (<< 1%).

While SFT often yields the best possible results, PEFT methods can achieve nearly the same degree of accuracy while significantly reducing the computational cost. As language models continue to grow in size, PEFT is gaining popularity due to its lightweight requirements on training hardware.

NeMo supports SFT and four PEFT methods which can be used with various transformer-based models. Here is a collection of conversion scripts that convert popular models from HF format to nemo format.

GPT 3

Nemotron

LLaMa 1/2

Falcon

Starcoder

Mistral

Mixtral

Gemma

T5

SFT
LoRA
P-tuning
Adapters (Canonical)
IA3

Learn more about SFT and PEFT in NeMo with the Developer Guide which provides an overview on how they works in NeMo. Read more about the supported PEFT methods here.

For an end-to-end example of LoRA tuning, take a look at the Step-by-step LoRA Notebok. We also have many SFT and PEFT Examples for each model for you to play with.

The API guide can be found here

Previous Cloud Service Providers
Next Developer Quick Start
© Copyright 2023-2024, NVIDIA. Last updated on May 3, 2024.