Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.

Migrate PEFT Training and Inference from NeMo 1.0 to NeMo 2.0

NeMo 1.0 (Previous Release)

In NeMo 1.0, PEFT is enabled by setting the peft_scheme field in the YAML file.

model:
  peft:
    peft_scheme: "lora"
    restore_from_path: null

In code, the only differences between PEFT and full-parameter fine-tuning are the add_adapter and load_adapters functions.

NeMo 2.0 (New Release)

In NeMo 2.0, PEFT is enabled by passing in the PEFT method callback to both the trainer and model:

lora = llm.peft.LoRA(
  target_modules=['linear_qkv', 'linear_proj'],  # 'linear_fc1', 'linear_fc2'],
  dim=32,
)

trainer = nl.Trainer(..., callbacks=[lora])
model = llm.LlamaModel(..., model_transform=lora)

Using the fine-tune API, PEFT is enabled by passing in the peft flag. The base model and adapter paths can also be specified.

from nemo.collections import llm

sft = llm.finetune(
    ...
    peft=llm.peft.LoRA(target_modules=['linear_qkv', 'linear_proj'], dim=32),
    ...
)
sft.resume.import_path = "hf://..."
sft.resume.adapter_path = "/path/to/checkpoints/last"

Migration Steps

  1. Remove the peft section from your YAML config file.

  2. Create an instance of the LoRA class (or any PEFT method) with the configs you want (same as those in the peft section from YAML config).

  3. Pass the LoRA class to the “peft” field in llm.finetune. For inference, also pass in resume.adapter_path.

from nemo.collections import llm

sft = llm.finetune(
    ...
    peft=llm.peft.LoRA(target_modules=['linear_qkv', 'linear_proj'], dim=32),
    ...
)
sft.resume.import_path = "hf://..."
sft.resume.adapter_path = "/path/to/checkpoints/last"

Some Notes on Migration

  • In NeMo 1, the four LoRA targets were named [‘attention_qkv’, ‘attention_dense’, ‘mlp_fc1’, ‘mlp_fc2’]. These names were different from the actual names of the linear layers, which was confusing for some users. In NeMo 2, the four LoRA targets are renamed to match the linear layers: [‘linear_qkv’, ‘linear_proj’, ‘linear_fc1’, ‘linear_fc2’]