Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Developer Quick Start

The quick start guide provides an overview of a PEFT workflow in NeMo.

Terminology: PEFT vs Adapter

This tutorial uses the term, “PEFT,” to describe the Parameter-Efficient Fine-Tuning method. Additionally, it uses the term, “adapter,” to refer to the supplementary module injected into a frozen base model. Each PEFT model has the flexibility to utilize one or more types of adapters.

Among the different PEFT methods, one method is sometimes referred to as “adapters” because it was among the earliest proposed uses of adapter modules in NLP. To differentiate between the two usages, we’ll refer to this PEFT method as the “canonical” adapters.

How PEFT work in NeMo models

Each PEFT method has one or more types of adapters that need to be injected into the base model. In NeMo models, the adapter logic and adapter weights are already built into the submodules, but they are disabled by default for ordinary training and fine-tuning.

When doing PEFT, the adapter logic path can be enabled when model.add_adapter(peft_cfg) is called. In this function, the model scans through each applicable adapter for the current PEFT method, examining its submodules to identify adapter logic paths that can be enabled.

Subsequently, the base model’s weights are frozen, while the newly added adapter weights remain unfrozen and can be updated during fine-tuning, resulting in efficiency gains in the number of fine-tuned parameters.

PEFT config classes

Each PEFT method is specified by a PEFTConfig class which stores the types of adapters applicable to the PEFT method, as well as hyperparameters required to initialize these adapter modules.

The following five PEFT methods are currently supported:

  1. LoRA: LoraPEFTConfig

  2. QLoRA: QLoraPEFTConfig

  3. P-Tuning: PtuningPEFTConfig

  4. Adapters (canonical): CanonicalAdaptersPEFTConfig

  5. IA3: IA3PEFTConfig

These config classes simplify experimenting with different adapters by allowing easy changes to the config class.

It is also possible to use a combination of the PEFT methods in NeMo since they are orthogonal to each other. You can achieve this by passing a list of PEFTConfig objects to add_adapter instead of a single one. For example, a common workflow is to combine P-Tuning and Adapter, which can be done using model.add_adapter([PtuningPEFTConfig(model_cfg), CanonicalAdaptersPEFTConfig(model_cfg)])

Base model classes

PEFT in NeMo is built with a mix-in class that does not belong to any model in particular. This means that the same interface is available to different NeMo models. Currently, NeMo supports PEFT for GPT-style models such as GPT 3, Nemotron, LLaMa 1/2 (MegatronGPTSFTModel), as well as T5 (MegatronT5SFTModel).

Full fine-tuning vs PEFT

You can switch between full fine-tuning and PEFT by removing calls to add_adapter and load_adapter.

The code snippet below illustrates the core API of full fine-tuning and PEFT.

trainer = MegatronTrainerBuilder(config).create_trainer()
model_cfg = MegatronGPTSFTModel.merge_cfg_with(config.model.restore_from_path, config)

### Training API ###
model = MegatronGPTSFTModel.restore_from(restore_path, model_cfg, trainer) # restore from pretrained ckpt
+ peft_cfg = LoraPEFTConfig(model_cfg)
+ model.add_adapter(peft_cfg)
trainer.fit(model)  # saves adapter weights only

### Inference API ###
# Restore from base then load adapter API
model = MegatronGPTSFTModel.restore_from(restore_path, trainer, model_cfg)
+ model.load_adapters(adapter_save_path, peft_cfg)
model.freeze()
trainer.predict(model)