Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Developer Quick Start
The quick start guide provides an overview of a PEFT workflow in NeMo.
Terminology: PEFT vs Adapter
This tutorial uses the term, “PEFT,” to describe the Parameter-Efficient Fine-Tuning method. Additionally, it uses the term, “adapter,” to refer to the supplementary module injected into a frozen base model. Each PEFT model has the flexibility to utilize one or more types of adapters.
Among the different PEFT methods, one method is sometimes referred to as “adapters” because it was among the earliest proposed uses of adapter modules in NLP. To differentiate between the two usages, we’ll refer to this PEFT method as the “canonical” adapters.
How PEFT work in NeMo models
Each PEFT method has one or more types of adapters that need to be injected into the base model. In NeMo models, the adapter logic and adapter weights are already built into the submodules, but they are disabled by default for ordinary training and fine-tuning.
When doing PEFT, the adapter logic path can be enabled when
model.add_adapter(peft_cfg)
is called. In this function, the model scans through each applicable adapter for the current PEFT method, examining its submodules to identify adapter logic paths that can be enabled.
Subsequently, the base model’s weights are frozen, while the newly added adapter weights remain unfrozen and can be updated during fine-tuning, resulting in efficiency gains in the number of fine-tuned parameters.
PEFT config classes
Each PEFT method is specified by a PEFTConfig
class which stores the
types of adapters applicable to the PEFT method, as well as
hyperparameters required to initialize these adapter modules.
The following five PEFT methods are currently supported:
LoRA:
LoraPEFTConfig
QLoRA:
QLoraPEFTConfig
P-Tuning:
PtuningPEFTConfig
Adapters (canonical):
CanonicalAdaptersPEFTConfig
IA3:
IA3PEFTConfig
These config classes simplify experimenting with different adapters by allowing easy changes to the config class.
It is also possible to use a combination of the PEFT methods in
NeMo since they are orthogonal to each other. You can achieve this by passing a list of PEFTConfig
objects to add_adapter
instead of a single one. For example, a common workflow is to combine P-Tuning
and Adapter, which can be done using
model.add_adapter([PtuningPEFTConfig(model_cfg), CanonicalAdaptersPEFTConfig(model_cfg)])
Base model classes
PEFT in NeMo is built with a mix-in class that does not belong to any
model in particular. This means that the same interface is available to
different NeMo models. Currently, NeMo supports PEFT for GPT-style
models such as GPT 3, Nemotron, LLaMa 1/2 (MegatronGPTSFTModel
), as
well as T5 (MegatronT5SFTModel
).
Full fine-tuning vs PEFT
You can switch between full fine-tuning and PEFT by removing calls to
add_adapter
and load_adapter
.
The code snippet below illustrates the core API of full fine-tuning and PEFT.
trainer = MegatronTrainerBuilder(config).create_trainer()
model_cfg = MegatronGPTSFTModel.merge_cfg_with(config.model.restore_from_path, config)
### Training API ###
model = MegatronGPTSFTModel.restore_from(restore_path, model_cfg, trainer) # restore from pretrained ckpt
+ peft_cfg = LoraPEFTConfig(model_cfg)
+ model.add_adapter(peft_cfg)
trainer.fit(model) # saves adapter weights only
### Inference API ###
# Restore from base then load adapter API
model = MegatronGPTSFTModel.restore_from(restore_path, trainer, model_cfg)
+ model.load_adapters(adapter_save_path, peft_cfg)
model.freeze()
trainer.predict(model)