Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Parameter Efficient Fine-Tuning (PEFT)

The new PEFT framework is built upon the SFT models and datasets, thereby inheriting all the dataset preparation requirements from SFT. For more details, please refer to the SFT section below.

PEFT Training and Inference

We offer a training and inference script in NeMo. Below is an example of how to use the training script. The TRAIN_FILEs (and VALIDATION_FILEs) follow the same format as SFT.

Take note of the model.peft.peft_scheme argument. You can train a LoRA, P-tuning, Adapter, or IA3 model by setting this argument to the desired PEFT method:

python3 /opt/NeMo/examples/nlp/language_modeling/tuning/megatron_gpt_finetuning.py \
  model.restore_from_path=<BASE_GPT_MODEL> \
  model.data.train_ds.num_workers=0 \
  model.data.validation_ds.num_workers=0 \
  model.data.train_ds.file_names=[<TRAIN_FILE1>,<TRAIN_FILE2>,...] \
  model.data.train_ds.concat_sampling_probabilities=[0.3,0.2,..] \ # should sum to 1 and be of the same length as number of training files
  model.data.validation_ds.file_names=[<VALIDATION_FILE1>, <VALIDATION_FILE2>,...] \
  model.data.train_ds.prompt_template='{input} Answer: {output}' \
  model.peft.peft_scheme='lora'  # can be replaced with 'adapter', 'ptuning' or 'ia3'
  model.answer_only_loss=True

At the end of training a ‘.nemo’ model is generated which contains the parameters for the PEFT model. Similarly, the PEFT framework has a single inference script as well:

python3 /opt/NeMo/examples/nlp/language_modeling/tuning/megatron_gpt_generate.py \
model.restore_from_path=<BASE_GPT_MODEL> \
model.peft.restore_from_path=<PEFT_MODEL> \
model.data.test_ds.file_names=[<TEST_FILE>] \
model.data.test_ds.names=['my_test_set'] \
model.data.test_ds.tokens_to_generate=30 \
inference.greedy=True \
inference.outfile_path=<OUTPUT_FILE>

Additionally, NeMo has a notebook which walks through the steps (which these scripts encapsulate) to train and run inference for PEFT models: https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/lora.ipynb

PEFT Training with NeMo Megatron Launcher

PEFT stage could launch PEFT methods including PTuning, LoRA, Adapters and IA3 in a single stage, by setting different peft scheme. It is implemented via adapter_mixins framework with a unify style. mix-n-match PEFT scheme like adapter_and_ptuning can be easily extended to do ia3_and_ptuning or lora_and_ptuning

PTuning does not need to flexibility to insert prompt tokens anywhere in the input. This feature has been removed for simplicity.

Common

To specify the configuration for ptuning (LoRA, adapter or IA3 learning), use all the run parameters to define the job specific config:

run:
  name: ${.task_name}_${.model_train_name}
  time_limit: "04:00:00"
  dependency: "singleton"
  convert_name: convert_nemo
  model_train_name: gpt3_1.3B
  task_name: "squad"
  results_dir: ${base_results_dir}/${.model_train_name}/ptuning_${.task_name}

To specify which language model checkpoint to load and its definition, use the model parameter:

model:
  language_model_path: ${base_results_dir}/${peft.run.model_train_name}/${peft.run.convert_name}/nemo_gpt1.3B_fp16.nemo
  tensor_model_parallel_size: 2
  pipeline_model_parallel_size: 1

Slurm

Set configuration for a Slurm cluster in the conf/cluster/bcm.yaml file:

partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"

Example:

To run only the evaluation pipeline and not the data preparation, training, conversion or inference pipelines set the conf/config.yaml file to:

stages:
  - peft

then run:

python3 main.py \
    peft=gpt3/squad \
    stages=["peft"] \
    peft.model.peft.peft_scheme="ptuning" \
    peft.model.megatron_amp_O2=False \
    peft.model.restore_from_path=${LANGUAGE_MODEL_PATH}\
    peft.exp_manager.exp_dir=${BASE_RESULTS_DIR}/${RUN_NAME}/ptuning \

Base Command Platform

In order to run the ptuning learning script on Base Command Platform, set the cluster_type parameter in conf/config.yaml to bcp or interactive. This can also be overridden from the command line, using hydra.

To run the ptuning pipeline to nemo-megatron-gpt-1.3B model converted checkpoint, run:

export HYDRA_FULL_ERROR=1
export TORCH_CPP_LOG_LEVEL=INFO NCCL_DEBUG=INFO

TRAIN="[/mount/workspace/databricks-dolly-15k-train.jsonl]"
VALID="[/mount/workspace/databricks-dolly-15k-val.jsonl]"
VALID_NAMES="[peft-squad]"
CONCAT_SAMPLING_PROBS="[1]"

PEFT_SCHEME="ptuning"
PEFT_EXP_DIR="/results/nemo_launcher/ptuning"
LOG_DIR="/results/nemo_launcher/ptuning_log"

TP_SIZE=2

PP_SIZE=1

python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py \
        peft=gpt3/squad \
        stages=[peft] \
        cluster_type=interactive \
        launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts \
        peft.model.peft.peft_scheme=${PEFT_SCHEME} \
        peft.trainer.precision=bf16 \
        peft.trainer.max_steps=100 \
        peft.trainer.devices=2 \
        peft.trainer.val_check_interval=10 \
        peft.model.megatron_amp_O2=False \
        peft.model.restore_from_path=/mount/workspace/nemo_gpt1.3B_fp16.nemo \
        peft.model.tensor_model_parallel_size=${TP_SIZE} \
        peft.model.pipeline_model_parallel_size=${PP_SIZE} \
        peft.model.optim.lr=5e-6 \
        peft.model.answer_only_loss=True \
        peft.model.data.train_ds.file_names=${TRAIN} \
        peft.model.data.train_ds.micro_batch_size=1 \
        peft.model.data.train_ds.global_batch_size=32 \
        peft.model.data.train_ds.concat_sampling_probabilities=${CONCAT_SAMPLING_PROBS} \
        peft.model.data.validation_ds.micro_batch_size=1 \
        peft.model.data.validation_ds.global_batch_size=32 \
        peft.model.data.validation_ds.file_names=${VALID} \
        peft.model.data.validation_ds.names=${VALID_NAMES} \
        peft.model.data.test_ds.micro_batch_size=1 \
        peft.model.data.test_ds.global_batch_size=128 \
        peft.model.data.train_ds.num_workers=0 \
        peft.model.data.validation_ds.num_workers=0 \
        peft.model.data.test_ds.num_workers=0 \
        peft.model.data.validation_ds.metric.name=loss \
        peft.model.data.test_ds.metric.name=loss \
        peft.exp_manager.exp_dir=${PEFT_EXP_DIR} \
        peft.exp_manager.explicit_log_dir=${LOG_DIR} \
        peft.exp_manager.resume_if_exists=True \
        peft.exp_manager.resume_ignore_no_checkpoint=True \
        peft.exp_manager.create_checkpoint_callback=True \
        peft.exp_manager.checkpoint_callback_params.monitor=validation_loss

The command above assumes you mounted the data workspace in /mount/workspace/ (e.g. the example script uses databricks-dolly-15k dataset), and the results workspace in /results. The command needs set different peft.exp_manager.exp_dir for different PEFT jobs. The stdout and stderr outputs will also be redirected to the /results/nemo_launcher/ptuning_log, to be able to download the logs from NGC. Any other parameter can also be added to the command to modify its behavior.