In the NeMo framework, |P-tuning| and prompt tuning methods are collectively known as prompt learning. Both methods are parameter-efficient alternatives to fine-tuning pretrained language models. The NVIDIA NeMo implementation lets you use one pretrained GPT, T5, or mT5 model on many downstream tasks without needing to tune the model’s full set of parameters. It also lets you add new tasks to your model without overwriting or disrupting previous tasks for which the model has already been p-tuned or prompt-tuned. Because neither method alters the original model parameters, P‑tuning and prompt-tuning also avoid the cartographic forgetting issues often encountered when fine-tuning models.
Instead of selecting discrete text prompts in a manual or automated fashion, P‑tuning and prompt tuning utilize virtual prompt embeddings that can be optimized via gradient decent. The only difference between prompt tuning and P‑tuning in NeMo-Megatron is the architecture used to tune the soft prompt tokens during training.
The NVIDIA P‑tuning implementation is based on Liu et al’s paper GPT Understands, Too.
The prompt tuning implementation is based on Lester et. al’s EMNLP 2021 paper The Power of Scale for Parameter-Efficient prompt tuning.”
For more details about these implementations, see Prompt Learning in the NeMo framework documentation.
The NeMo framework supports the SQuAD v1.1 benchmark for prompt learning. When used with the default prompt learning configuration file, the NVIDIA scripts download and preprocess the original SQuAD v1.1 dataset to prompt learning dataset format. You can also bring your own task dataset, provided that it has been processed into prompt learning dataset format.
You specify the configuration to be used for prompt learning by setting the prompt_learning
configuration in conf/config.yaml
to specify the prompt learning configuration file. You must include the prompt_learning
configuration in stages
to run the prompt learning pipeline.
To run prompt learning on a squad
task, set
the prompt_learning
configuration to gpt3/squad
, which specifies the prompt learning configuration file as conf/prompt_learning/gpt3/squad.yaml
. You can use optimizations such as sequence-parallelism from the base GPT model while prompt-learning as well. To enable this, set model.sequence_sequence_parallel
to True
.
Common
set the run
configurations to define the job-specific configuration for prompt learning:
run:
name: ${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
convert_name: convert_nemo
model_train_name: gpt3_5b
task_name: "squad"
results_dir: ${base_results_dir}/${.model_train_name}/prompt_learning_${.task_name}
To specify which language model checkpoint to load and the parameter it trained with,
set the model
configurations:
model:
language_model_path: ${base_results_dir}/${prompt_learning.run.model_train_name}/${prompt_learning.run.convert_name}/megatron_gpt.nemo # Restore lanugage model from pre-trained .nemo checkpoint
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: 1
gpus_per_node: null
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the prompt learning pipeline and not the data preparation,
training, conversion, or another pipeline, set the stages
section of conf/config.yaml
to:
stages:
- prompt_learning
Then enter:
python3 main.py
Base Command Platform
To run the prompt learning script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
. This configuration can be overridden from the command line using hydra. This script must be launched in a multi-node job.
To run the prompt learning pipeline to prompt-learn a 5B GPT model
converted checkpoint stored in /mount/results/gpt3_5b/convert_nemo
,
enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py prompt_learning=gpt3/squad \
stages=prompt_learning cluster_type=bcp \
launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \
prompt_learning.run.model_train_name=gpt3_5b \
prompt_learning.model.language_model_path=/mount/results/gpt3_5b/convert_nemo/results/megatron_gpt.nemo \
>> /results/prompt_learning_gpt3_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/prompt_learning_gpt3_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
You specify the configuration to be used for prompt learning by setting the prompt_learning
configuration in conf/config.yaml
to specify the prompt learning configuration file. You must include the prompt_learning
configuration in stages
to run the prompt learning pipeline.
To run prompt learning on a squad
task, set
the prompt_learning
configuration to t5/squad
for a T5 model, or mt5/squad
for an mT5 model. These values respectively specify the prompt learning configuration file as conf/prompt_learning/t5/squad.yaml``or ``conf/prompt_learning/mt5/squad.yaml
.
You must include the prompt_learning
configuration in stages
to run
the prompt learning pipeline.
Common
set the run
configurations to define the job-specific configuration for prompt learning:
run:
name: ${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
convert_name: convert_nemo
model_train_name: t5_220m # or mt5_390m
task_name: "squad"
results_dir: ${base_results_dir}/${.model_train_name}/prompt_learning_${.task_name}
To specify the language model checkpoint to load and its definition,
set the model
configurations:
model:
language_model_path: ${base_results_dir}/${prompt_learning.run.model_train_name}/${prompt_learning.run.convert_name}/megatron_t5.nemo # or megatron_mt5.nemo
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
Slurm
Set configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: 1
gpus_per_node: null
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the prompt learning pipeline and not the data preparation,
training, conversion, or another pipeline, set the stages
section of conf/config.yaml
to:
stages:
- prompt_learning
Then enter:
python3 main.py
Base Command Platform
To run the prompt learning script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
. This configuration can be overridden from the command line using hydra. This script must be launched in a multi-node job.
To run the prompt learning pipeline to prompt-learn a 220M T5 model
converted checkpoint stored in /mount/results/t5_220m/convert_nemo
,
enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py prompt_learning=t5/squad \
stages=prompt_learning cluster_type=bcp \
launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \
prompt_learning.run.model_train_name=t5_220m \
prompt_learning.model.language_model_path=/mount/results/t5_220m/convert_nemo/results/megatron_t5.nemo \
>> /results/prompt_learning_t5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/prompt_learning_t5_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
To run the prompt learning pipeline to prompt-learn a 390M mT5 model
converted checkpoint stored in /mount/results/mt5_390m/convert_nemo
,
run:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py prompt_learning=mt5/squad \
stages=prompt_learning cluster_type=bcp \
launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \
prompt_learning.run.model_train_name=mt5_390m \
prompt_learning.model.language_model_path=/mount/results/t5_220m/convert_nemo/results/megatron_mt5.nemo \
>> /results/prompt_learning_mt5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/prompt_learning_mt5_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.