Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Parameter Efficient Fine-Tuning (PEFT)

Run PEFT with NeMo Launcher

To run PEFT update conf/config.yaml:

defaults:
  - peft: falcon/squad

stages:
  - peft

Execute the launcher pipeline: python3 main.py.

Configuration

Default configurations for PEFT with squad can be found in conf/peft/falcon/squad.yaml. Fine-tuning configuration is divided into four sections run, trainer, exp_manger and model.

run:
 name: ${.task_name}_${.model_train_name}
 time_limit: "04:00:00"
 dependency: "singleton"
 convert_name: convert_nemo
 model_train_name: falcon_7b
 convert_dir: ${base_results_dir}/${peft.run.model_train_name}/${peft.run.convert_name}
 task_name: "squad"
 results_dir: ${base_results_dir}/${.model_train_name}/peft_${.task_name}

Set the number of nodes and devices for fine-tuning:

trainer:
  num_nodes: 1
  devices: 8
model:
  restore_from_path: ${peft.run.convert_dir}/results/megatron_falcon.nemo

restore_from_path sets the path to the .nemo checkpoint to run fine-tuning.

Set tensor parallel and pipelien parallel size for different model sizes.

For 40B PEFT:

model:
    tensor_model_parallel_size: 8
    pipeline_model_parallel_size: 1

For 180B PEFT:

model:
    tensor_model_parallel_size: 8
    pipeline_model_parallel_size: 2

Set PEFT specific configruation:

model:
    peft:
        peft_scheme: "lora"

peft_scheme sets the fine-tuning scheme to be used. Supported schemes include: lora, ptuning.