Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.
Parameter Efficient Fine-Tuning (PEFT)
Run PEFT Training with NeMo Framework Launcher
The PEFT stage can execute various PEFT methods, such as P-Tuning, LoRA, Adapters, and IA3, within a single stage by configuring different PEFT schemes. This functionality is implemented using the adapter_mixins framework, which ensures a consistent style. Additionally, the mix-and-match PEFT scheme, like adapter_and_ptuning, can be easily extended to include combinations like ia3_and_ptuning or lora_and_ptuning.
Note
The feature that allowed P-Tuning to insert prompt tokens anywhere in the input is no longer necessary and has been removed to simplify the process.
Run P-Tuning on a Common Cluster
To specify the configuration for ptuning (LoRA, Adapter, or IA3 learning), include all the run parameters to define the job-specific config:
run: name: ${.task_name}_${.model_train_name} time_limit: "04:00:00" dependency: "singleton" convert_name: convert_nemo model_train_name: nemotron task_name: "squad" results_dir: ${base_results_dir}/${.model_train_name}/ptuning_${.task_name}
To specify which language model checkpoint to load and its definition, use the model parameter. For Nemotron 340B, use the following model parallel settings for PEFT:
model: language_model_path: ${base_results_dir}/${peft.run.model_train_name}/${peft.run.convert_name}/nemotron.nemo tensor_model_parallel_size: 8 pipeline_model_parallel_size: 3
Run P-Tuning on a Slurm Cluster
Set the configuration for a Slurm cluster in the conf/cluster/bcm.yaml file:
partition: null account: null exclusive: True gpus_per_task: null gpus_per_node: 8 mem: 0 overcommit: False job_name_prefix: "nemo-megatron-"
To run only the evaluation pipeline, while excluding the data preparation, training, conversion, and inference pipelines, set the conf/config.yaml file to:
stages: - peft
Next, run the following Python script:
python3 main.py \ peft=nemotron/squad \ stages=["peft"] \ peft.model.peft.peft_scheme="ptuning" \ peft.model.megatron_amp_O2=True \ peft.model.restore_from_path=${LANGUAGE_MODEL_PATH}\ peft.exp_manager.exp_dir=${BASE_RESULTS_DIR}/${RUN_NAME}/ptuning \
Run P-Tuning on the Base Command Platform
To run the P-Tuning learning script on the Base Command Platform, you need to adjust the cluster_type
parameter in the conf/config.yaml file to either bcp
or interactive
. Alternatively, you can override this setting directly from the command line using hydra
.
To run the P-Tuning pipeline on a converted checkpoint in the Nemotron model, use the following command::
export HYDRA_FULL_ERROR=1
export TORCH_CPP_LOG_LEVEL=INFO NCCL_DEBUG=INFO
TRAIN="[/mount/workspace/databricks-dolly-15k-train.jsonl]"
VALID="[/mount/workspace/databricks-dolly-15k-val.jsonl]"
VALID_NAMES="[peft-squad]"
CONCAT_SAMPLING_PROBS="[1]"
PEFT_SCHEME="ptuning"
PEFT_EXP_DIR="/results/nemo_launcher/ptuning"
LOG_DIR="/results/nemo_launcher/ptuning_log"
TP_SIZE=8
PP_SIZE=3
python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py \
peft=nemotron/squad \
stages=[peft] \
cluster_type=interactive \
launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts \
peft.model.peft.peft_scheme=${PEFT_SCHEME} \
peft.trainer.precision=bf16 \
peft.trainer.max_steps=100 \
peft.trainer.devices=2 \
peft.trainer.val_check_interval=10 \
peft.model.megatron_amp_O2=True \
peft.model.restore_from_path=/mount/workspace/nemotron.nemo \
peft.model.tensor_model_parallel_size=${TP_SIZE} \
peft.model.pipeline_model_parallel_size=${PP_SIZE} \
peft.model.optim.lr=1e-4 \
peft.model.answer_only_loss=True \
peft.model.data.train_ds.file_names=${TRAIN} \
peft.model.data.train_ds.micro_batch_size=1 \
peft.model.data.train_ds.global_batch_size=32 \
peft.model.data.train_ds.concat_sampling_probabilities=${CONCAT_SAMPLING_PROBS} \
peft.model.data.validation_ds.micro_batch_size=1 \
peft.model.data.validation_ds.global_batch_size=32 \
peft.model.data.validation_ds.file_names=${VALID} \
peft.model.data.validation_ds.names=${VALID_NAMES} \
peft.model.data.test_ds.micro_batch_size=1 \
peft.model.data.test_ds.global_batch_size=128 \
peft.model.data.train_ds.num_workers=0 \
peft.model.data.validation_ds.num_workers=0 \
peft.model.data.test_ds.num_workers=0 \
peft.model.data.validation_ds.metric.name=loss \
peft.model.data.test_ds.metric.name=loss \
peft.exp_manager.exp_dir=${PEFT_EXP_DIR} \
peft.exp_manager.explicit_log_dir=${LOG_DIR} \
peft.exp_manager.resume_if_exists=True \
peft.exp_manager.resume_ignore_no_checkpoint=True \
peft.exp_manager.create_checkpoint_callback=True \
peft.exp_manager.checkpoint_callback_params.monitor=validation_loss
The above command presumes that you’ve mounted the data workspace at /mount/workspace/
and the results workspace at /results
. The sample script uses the databricks-dolly-15k dataset.
For different PEFT jobs, you need to specify different directories for peft.exp_manager.exp_dir
. The standard output (stdout) and standard error (stderr) will be redirected to /results/nemo_launcher/ptuning_log
, enabling you to download the logs from NVIDIA NGC. You can also add any other parameter to the command to alter its functionality.