Developer Quick Start

The quick start guide provides an overview of a PEFT workflow in NeMo.

Terminology: PEFT vs Adapter

This tutorial uses the term, “PEFT,” to describe the Parameter-Efficient Fine-Tuning method. Additionally, it uses the term, “adapter,” to refer to the supplementary module injected into a frozen base model. Each PEFT model has the flexibility to utilize one or more types of adapters.

Among the different PEFT methods, one method is sometimes referred to as “adapters” because it was among the earliest proposed uses of adapter modules in NLP. To differentiate between the two usages, we’ll refer to this PEFT method as the “canonical” adapters.

How PEFT work in NeMo models

Each PEFT method has one or more types of adapters that need to be injected into the base model. In NeMo models, the adapter logic and adapter weights are already built into the submodules, but they are disabled by default for ordinary training and fine-tuning.

When doing PEFT, the adapter logic path can be enabled when model.add_adapter(peft_cfg) is called. In this function, the model scans through each applicable adapter for the current PEFT method, examining its submodules to identify adapter logic paths that can be enabled.

Subsequently, the base model’s weights are frozen, while the newly added adapter weights remain unfrozen and can be updated during fine-tuning, resulting in efficiency gains in the number of fine-tuned parameters.

PEFT config classes

Each PEFT method is specified by a PEFTConfig class which stores the types of adapters applicable to the PEFT method, as well as hyperparameters required to initialize these adapter modules.

The following five PEFT methods are currently supported:

LoRA: LoraPEFTConfig
QLoRA: QLoraPEFTConfig
P-Tuning: PtuningPEFTConfig
Adapters (canonical): CanonicalAdaptersPEFTConfig
IA3: IA3PEFTConfig

These config classes simplify experimenting with different adapters by allowing easy changes to the config class.

It is also possible to use a combination of the PEFT methods in NeMo since they are orthogonal to each other. You can achieve this by passing a list of PEFTConfig objects to add_adapter instead of a single one. For example, a common workflow is to combine P-Tuning and Adapter, which can be done using model.add_adapter([PtuningPEFTConfig(model_cfg), CanonicalAdaptersPEFTConfig(model_cfg)])

Base model classes

PEFT in NeMo is built with a mix-in class that does not belong to any model in particular. This means that the same interface is available to different NeMo models. Currently, NeMo supports PEFT for GPT-style models such as GPT 3, Nemotron, LLaMa 1/2 (MegatronGPTSFTModel), as well as T5 (MegatronT5SFTModel).

Full fine-tuning vs PEFT

You can switch between full fine-tuning and PEFT by removing calls to add_adapter and load_adapter.

The code snippet below illustrates the core API of full fine-tuning and PEFT.

trainer = MegatronTrainerBuilder(config).create_trainer()
model_cfg = MegatronGPTSFTModel.merge_cfg_with(config.model.restore_from_path, config)

### Training API ###
model = MegatronGPTSFTModel.restore_from(restore_path, model_cfg, trainer) # restore from pretrained ckpt
+ peft_cfg = LoraPEFTConfig(model_cfg)
+ model.add_adapter(peft_cfg)
trainer.fit(model)  # saves adapter weights only

### Inference API ###
# Restore from base then load adapter API
model = MegatronGPTSFTModel.restore_from(restore_path, trainer, model_cfg)
+ model.load_adapters(adapter_save_path, peft_cfg)
model.freeze()
trainer.predict(model)