Model Prompt Learning

In the NeMo framework, |P-tuning| and prompt tuning methods are collectively known as prompt learning. Both methods are parameter-efficient alternatives to fine-tuning pretrained language models. The NVIDIA NeMo implementation lets you use one pretrained GPT, T5, or mT5 model on many downstream tasks without needing to tune the model’s full set of parameters. It also lets you add new tasks to your model without overwriting or disrupting previous tasks for which the model has already been p-tuned or prompt-tuned. Because neither method alters the original model parameters, P‑tuning and prompt-tuning also avoid the cartographic forgetting issues often encountered when fine-tuning models.

Instead of selecting discrete text prompts in a manual or automated fashion, P‑tuning and prompt tuning utilize virtual prompt embeddings that can be optimized via gradient decent. The only difference between prompt tuning and P‑tuning in NeMo-Megatron is the architecture used to tune the soft prompt tokens during training.

For more details about these implementations, see Prompt Learning in the NeMo framework documentation.

The NeMo framework supports the SQuAD v1.1 benchmark for prompt learning. When used with the default prompt learning configuration file, the NVIDIA scripts download and preprocess the original SQuAD v1.1 dataset to prompt learning dataset format. You can also bring your own task dataset, provided that it has been processed into prompt learning dataset format.

You specify the configuration to be used for prompt learning by setting the prompt_learning configuration in conf/config.yaml to specify the prompt learning configuration file. You must include the prompt_learning configuration in stages to run the prompt learning pipeline.

To run prompt learning on a squad task, set the prompt_learning configuration to gpt3/squad, which specifies the prompt learning configuration file as conf/prompt_learning/gpt3/squad.yaml. You can use optimizations such as sequence-parallelism from the base GPT model while prompt-learning as well. To enable this, set model.sequence_sequence_parallel to True.

Common

set the run configurations to define the job-specific configuration for prompt learning:

Copy
Copied!
            

run: name: ${.task_name}_${.model_train_name} time_limit: "04:00:00" dependency: "singleton" convert_name: convert_nemo model_train_name: gpt3_5b task_name: "squad" results_dir: ${base_results_dir}/${.model_train_name}/prompt_learning_${.task_name}

To specify which language model checkpoint to load and the parameter it trained with, set the model configurations:

Copy
Copied!
            

model: language_model_path: ${base_results_dir}/${prompt_learning.run.model_train_name}/${prompt_learning.run.convert_name}/megatron_gpt.nemo # Restore lanugage model from pre-trained .nemo checkpoint tensor_model_parallel_size: 1 pipeline_model_parallel_size: 1

Slurm

Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml:

Copy
Copied!
            

partition: null account: null exclusive: True gpus_per_task: 1 gpus_per_node: null mem: 0 overcommit: False job_name_prefix: "nemo-megatron-"

Example

To run only the prompt learning pipeline and not the data preparation, training, conversion, or another pipeline, set the stages section of conf/config.yaml to:

Copy
Copied!
            

stages: - prompt_learning

Then enter:

Copy
Copied!
            

python3 main.py

Base Command Platform

To run the prompt learning script on Base Command Platform, set the cluster_type configuration in conf/config.yaml to bcp. This configuration can be overridden from the command line using hydra. This script must be launched in a multi-node job.

To run the prompt learning pipeline to prompt-learn a 5B GPT model converted checkpoint stored in /mount/results/gpt3_5b/convert_nemo, enter:

Copy
Copied!
            

python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py prompt_learning=gpt3/squad \ stages=prompt_learning cluster_type=bcp \ launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \ prompt_learning.run.model_train_name=gpt3_5b \ prompt_learning.model.language_model_path=/mount/results/gpt3_5b/convert_nemo/results/megatron_gpt.nemo \ >> /results/prompt_learning_gpt3_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/prompt_learning_gpt3_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

You specify the configuration to be used for prompt learning by setting the prompt_learning configuration in conf/config.yaml to specify the prompt learning configuration file. You must include the prompt_learning configuration in stages to run the prompt learning pipeline.

To run prompt learning on a squad task, set the prompt_learning configuration to t5/squad for a T5 model, or mt5/squad for an mT5 model. These values respectively specify the prompt learning configuration file as conf/prompt_learning/t5/squad.yaml``or ``conf/prompt_learning/mt5/squad.yaml.

You must include the prompt_learning configuration in stages to run the prompt learning pipeline.

Common

set the run configurations to define the job-specific configuration for prompt learning:

Copy
Copied!
            

run: name: ${.task_name}_${.model_train_name} time_limit: "04:00:00" dependency: "singleton" convert_name: convert_nemo model_train_name: t5_220m # or mt5_390m task_name: "squad" results_dir: ${base_results_dir}/${.model_train_name}/prompt_learning_${.task_name}

To specify the language model checkpoint to load and its definition, set the model configurations:

Copy
Copied!
            

model: language_model_path: ${base_results_dir}/${prompt_learning.run.model_train_name}/${prompt_learning.run.convert_name}/megatron_t5.nemo # or megatron_mt5.nemo tensor_model_parallel_size: 1 pipeline_model_parallel_size: 1

Slurm

Set configuration for a Slurm cluster in conf/cluster/bcm.yaml:

Copy
Copied!
            

partition: null account: null exclusive: True gpus_per_task: 1 gpus_per_node: null mem: 0 overcommit: False job_name_prefix: "nemo-megatron-"

Example

To run only the prompt learning pipeline and not the data preparation, training, conversion, or another pipeline, set the stages section of conf/config.yaml to:

Copy
Copied!
            

stages: - prompt_learning

Then enter:

Copy
Copied!
            

python3 main.py

Base Command Platform

To run the prompt learning script on Base Command Platform, set the cluster_type configuration in conf/config.yaml to bcp. This configuration can be overridden from the command line using hydra. This script must be launched in a multi-node job.

To run the prompt learning pipeline to prompt-learn a 220M T5 model converted checkpoint stored in /mount/results/t5_220m/convert_nemo, enter:

Copy
Copied!
            

python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py prompt_learning=t5/squad \ stages=prompt_learning cluster_type=bcp \ launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \ prompt_learning.run.model_train_name=t5_220m \ prompt_learning.model.language_model_path=/mount/results/t5_220m/convert_nemo/results/megatron_t5.nemo \ >> /results/prompt_learning_t5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/prompt_learning_t5_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

To run the prompt learning pipeline to prompt-learn a 390M mT5 model converted checkpoint stored in /mount/results/mt5_390m/convert_nemo, run:

Copy
Copied!
            

python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py prompt_learning=mt5/squad \ stages=prompt_learning cluster_type=bcp \ launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \ prompt_learning.run.model_train_name=mt5_390m \ prompt_learning.model.language_model_path=/mount/results/t5_220m/convert_nemo/results/megatron_mt5.nemo \ >> /results/prompt_learning_mt5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/prompt_learning_mt5_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

© Copyright 2023, NVIDIA. Last updated on Sep 13, 2023.