Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.

Parameter Efficient Fine-Tuning (PEFT)

Run PEFT Training with NeMo Framework Launcher

The PEFT stage can execute various PEFT methods, such as P-Tuning, LoRA, Adapters, and IA3, within a single stage by configuring different PEFT schemes. This functionality is implemented using the adapter_mixins framework, which ensures a consistent style. Additionally, the mix-and-match PEFT scheme, like adapter_and_ptuning, can be easily extended to include combinations like ia3_and_ptuning or lora_and_ptuning.

Note

The feature that allowed P-Tuning to insert prompt tokens anywhere in the input is no longer necessary and has been removed to simplify the process.

Run P-Tuning on a Common Cluster

  1. To specify the configuration for ptuning (LoRA, Adapter, or IA3 learning), include all the run parameters to define the job-specific config:

    run:
      name: ${.task_name}_${.model_train_name}
      time_limit: "04:00:00"
      dependency: "singleton"
      convert_name: convert_nemo
      model_train_name: nemotron
      task_name: "squad"
      results_dir: ${base_results_dir}/${.model_train_name}/ptuning_${.task_name}
    
  2. To specify which language model checkpoint to load and its definition, use the model parameter. For Nemotron 340B, use the following model parallel settings for PEFT:

    model:
      language_model_path: ${base_results_dir}/${peft.run.model_train_name}/${peft.run.convert_name}/nemotron.nemo
      tensor_model_parallel_size: 8
      pipeline_model_parallel_size: 3
    

Run P-Tuning on a Slurm Cluster

  1. Set the configuration for a Slurm cluster in the conf/cluster/bcm.yaml file:

    partition: null
    account: null
    exclusive: True
    gpus_per_task: null
    gpus_per_node: 8
    mem: 0
    overcommit: False
    job_name_prefix: "nemo-megatron-"
    
  2. To run only the evaluation pipeline, while excluding the data preparation, training, conversion, and inference pipelines, set the conf/config.yaml file to:

    stages:
      - peft
    
  3. Next, run the following Python script:

    python3 main.py \
        peft=nemotron/squad \
        stages=["peft"] \
        peft.model.peft.peft_scheme="ptuning" \
        peft.model.megatron_amp_O2=True \
        peft.model.restore_from_path=${LANGUAGE_MODEL_PATH}\
        peft.exp_manager.exp_dir=${BASE_RESULTS_DIR}/${RUN_NAME}/ptuning \
    

Run P-Tuning on the Base Command Platform

To run the P-Tuning learning script on the Base Command Platform, you need to adjust the cluster_type parameter in the conf/config.yaml file to either bcp or interactive. Alternatively, you can override this setting directly from the command line using hydra.

To run the P-Tuning pipeline on a converted checkpoint in the Nemotron model, use the following command::

export HYDRA_FULL_ERROR=1
export TORCH_CPP_LOG_LEVEL=INFO NCCL_DEBUG=INFO

TRAIN="[/mount/workspace/databricks-dolly-15k-train.jsonl]"
VALID="[/mount/workspace/databricks-dolly-15k-val.jsonl]"
VALID_NAMES="[peft-squad]"
CONCAT_SAMPLING_PROBS="[1]"

PEFT_SCHEME="ptuning"
PEFT_EXP_DIR="/results/nemo_launcher/ptuning"
LOG_DIR="/results/nemo_launcher/ptuning_log"

TP_SIZE=8

PP_SIZE=3

python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py \
        peft=nemotron/squad \
        stages=[peft] \
        cluster_type=interactive \
        launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts \
        peft.model.peft.peft_scheme=${PEFT_SCHEME} \
        peft.trainer.precision=bf16 \
        peft.trainer.max_steps=100 \
        peft.trainer.devices=2 \
        peft.trainer.val_check_interval=10 \
        peft.model.megatron_amp_O2=True \
        peft.model.restore_from_path=/mount/workspace/nemotron.nemo \
        peft.model.tensor_model_parallel_size=${TP_SIZE} \
        peft.model.pipeline_model_parallel_size=${PP_SIZE} \
        peft.model.optim.lr=1e-4 \
        peft.model.answer_only_loss=True \
        peft.model.data.train_ds.file_names=${TRAIN} \
        peft.model.data.train_ds.micro_batch_size=1 \
        peft.model.data.train_ds.global_batch_size=32 \
        peft.model.data.train_ds.concat_sampling_probabilities=${CONCAT_SAMPLING_PROBS} \
        peft.model.data.validation_ds.micro_batch_size=1 \
        peft.model.data.validation_ds.global_batch_size=32 \
        peft.model.data.validation_ds.file_names=${VALID} \
        peft.model.data.validation_ds.names=${VALID_NAMES} \
        peft.model.data.test_ds.micro_batch_size=1 \
        peft.model.data.test_ds.global_batch_size=128 \
        peft.model.data.train_ds.num_workers=0 \
        peft.model.data.validation_ds.num_workers=0 \
        peft.model.data.test_ds.num_workers=0 \
        peft.model.data.validation_ds.metric.name=loss \
        peft.model.data.test_ds.metric.name=loss \
        peft.exp_manager.exp_dir=${PEFT_EXP_DIR} \
        peft.exp_manager.explicit_log_dir=${LOG_DIR} \
        peft.exp_manager.resume_if_exists=True \
        peft.exp_manager.resume_ignore_no_checkpoint=True \
        peft.exp_manager.create_checkpoint_callback=True \
        peft.exp_manager.checkpoint_callback_params.monitor=validation_loss

The above command presumes that you’ve mounted the data workspace at /mount/workspace/ and the results workspace at /results. The sample script uses the databricks-dolly-15k dataset.

For different PEFT jobs, you need to specify different directories for peft.exp_manager.exp_dir. The standard output (stdout) and standard error (stderr) will be redirected to /results/nemo_launcher/ptuning_log, enabling you to download the logs from NVIDIA NGC. You can also add any other parameter to the command to alter its functionality.