Parameter Efficient Fine-Tuning (PEFT)

Run PEFT Training with NeMo Framework Launcher

The PEFT stage can execute various PEFT methods, such as P-Tuning, LoRA, Adapters, and IA3, within a single stage by configuring different PEFT schemes. This functionality is implemented using the adapter_mixins framework, which ensures a consistent style. Additionally, the mix-and-match PEFT scheme, like adapter_and_ptuning, can be easily extended to include combinations like ia3_and_ptuning or lora_and_ptuning.

The feature that allowed P-Tuning to insert prompt tokens anywhere in the input is no longer necessary and has been removed to simplify the process.

Run P-Tuning on a Common Cluster

  1. To specify the configuration for ptuning (LoRA, Adapter, or IA3 learning), include all the run parameters to define the job specific config:

run:
  name: ${.task_name}_${.model_train_name}
  time_limit: "04:00:00"
  dependency: "singleton"
  convert_name: convert_nemo
  model_train_name: nemotron
  task_name: "squad"
  results_dir: ${base_results_dir}/${.model_train_name}/ptuning_${.task_name}
  1. To specify which language model checkpoint to load and its definition, use the model parameter. For Nemotron 340B, use the following model parallel settings for PEFT:

model:
  language_model_path: ${base_results_dir}/${peft.run.model_train_name}/${peft.run.convert_name}/nemotron.nemo
  tensor_model_parallel_size: 8
  pipeline_model_parallel_size: 3

Run P-Tuning on a Slurm Cluster

  1. Set the configuration for a Slurm cluster in the conf/cluster/bcm.yaml file:

partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
  1. To run only the evaluation pipeline, while excluding the data preparation, training, conversion, and inference pipelines, set the conf/config.yaml file to:

stages:
  - peft
  1. Next, run the following Python script:

python3 main.py \
    peft=nemotron/squad \
    stages=["peft"] \
    peft.model.peft.peft_scheme="ptuning" \
    peft.model.megatron_amp_O2=True \
    peft.model.restore_from_path=${LANGUAGE_MODEL_PATH}\
    peft.exp_manager.exp_dir=${BASE_RESULTS_DIR}/${RUN_NAME}/ptuning \

Run P-Tuning on Base Command Platform

To run the P-Tuning learning script on the Base Command Platform, you n eed to adjust the cluster_type parameter in the conf/config.yaml file to either bcp or interactive. Alternatively, you can override this setting directly from the command line using hydra.

To run the P-Tuning pipeline on a converted checkpoint in the Nemotron model, use the following command::

export HYDRA_FULL_ERROR=1
export TORCH_CPP_LOG_LEVEL=INFO NCCL_DEBUG=INFO

TRAIN="[/mount/workspace/databricks-dolly-15k-train.jsonl]"
VALID="[/mount/workspace/databricks-dolly-15k-val.jsonl]"
VALID_NAMES="[peft-squad]"
CONCAT_SAMPLING_PROBS="[1]"

PEFT_SCHEME="ptuning"
PEFT_EXP_DIR="/results/nemo_launcher/ptuning"
LOG_DIR="/results/nemo_launcher/ptuning_log"

TP_SIZE=8

PP_SIZE=3

python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py \
        peft=nemotron/squad \
        stages=[peft] \
        cluster_type=interactive \
        launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts \
        peft.model.peft.peft_scheme=${PEFT_SCHEME} \
        peft.trainer.precision=bf16 \
        peft.trainer.max_steps=100 \
        peft.trainer.devices=2 \
        peft.trainer.val_check_interval=10 \
        peft.model.megatron_amp_O2=True \
        peft.model.restore_from_path=/mount/workspace/nemotron.nemo \
        peft.model.tensor_model_parallel_size=${TP_SIZE} \
        peft.model.pipeline_model_parallel_size=${PP_SIZE} \
        peft.model.optim.lr=1e-4 \
        peft.model.answer_only_loss=True \
        peft.model.data.train_ds.file_names=${TRAIN} \
        peft.model.data.train_ds.micro_batch_size=1 \
        peft.model.data.train_ds.global_batch_size=32 \
        peft.model.data.train_ds.concat_sampling_probabilities=${CONCAT_SAMPLING_PROBS} \
        peft.model.data.validation_ds.micro_batch_size=1 \
        peft.model.data.validation_ds.global_batch_size=32 \
        peft.model.data.validation_ds.file_names=${VALID} \
        peft.model.data.validation_ds.names=${VALID_NAMES} \
        peft.model.data.test_ds.micro_batch_size=1 \
        peft.model.data.test_ds.global_batch_size=128 \
        peft.model.data.train_ds.num_workers=0 \
        peft.model.data.validation_ds.num_workers=0 \
        peft.model.data.test_ds.num_workers=0 \
        peft.model.data.validation_ds.metric.name=loss \
        peft.model.data.test_ds.metric.name=loss \
        peft.exp_manager.exp_dir=${PEFT_EXP_DIR} \
        peft.exp_manager.explicit_log_dir=${LOG_DIR} \
        peft.exp_manager.resume_if_exists=True \
        peft.exp_manager.resume_ignore_no_checkpoint=True \
        peft.exp_manager.create_checkpoint_callback=True \
        peft.exp_manager.checkpoint_callback_params.monitor=validation_loss

The above command presumes that you’ve mounted the data workspace at /mount/workspace/ and the results workspace at /results. The sample script uses the databricks-dolly-15k dataset.

For different PEFT jobs, you need to specify different directories for peft.exp_manager.exp_dir. The standard output (stdout) and standard error (stderr) will be redirected to /results/nemo_launcher/ptuning_log, enabling you to download the logs from NVIDIA NGC. You can also add any other parameter to the command to alter its functionality.