Model Adapter Learning and IA3 Learning

The NeMo framework supports Adapter Learning and Infused Adapter by Inhibiting and Amplifying Inner Activations (IA3) learning. Both methods are parameter-efficient alternatives to fine-tuning pretrained language models. The NVIDIA NeMo implementation lets you use one pretrained GPT or T5 models on many downstream tasks without tuning the model’s full set of parameters. Because neither method modifies the original model parameters, they also avoid the cartographic forgetting issues often encountered when fine-tuning models.

Unlike P‑tuning and prompt-tuning, Adapter Learning and IA3 do not insert virtual prompts into the input. Adapter Learning introduces feedforward layers within the core transformer architecture which are updated for specific downstream tasks. IA3 adds even fewer parameters; they simply scale the hidden representations in the transformer layer. These parameters can be trained for specific downstream tasks.

Note that the paper proposes a recipe called t-few which introduces an unlikelihood loss function and a continued training procedure. The NVIDIA IA3 implementation does not support these additions, and only focuses on the core architectural change.

Adapter Learning and IA3 support SQuAD v1.1 benchmarks. With the default adapter learning and IA3 configuration file, NVIDIA scripts download and preprocess the original SQuAD v1.1 dataset to adapter learning and IA3 dataset format (the same format as for prompt learning). You can use your own task dataset as well.

You specify the configuration for adapter learning by setting the adapter_learning configuration in conf/config.yaml to specify the adapter learning configuration file. The adapter_learning value must be included in stages to run the adapter learning pipeline.

To run adapter learning on a squad task, set the adapter_learning configuration to gpt3/squad, which specifies the adapter learning configuration file as conf/adapter_learning/gpt3/squad.yaml.

You can configure IA3 learning the same way, by setting the ia3_learning configuration in conf/config.yaml to specify the IA3 learning configuration file. The ia3_learning value must be included in stages to run the IA3 learning pipeline.

To run IA3 learning on a squad task, set the ia3_learning configuration to gpt3/squad, which specifies the IA3 learning configuration file as conf/ia3_learning/gpt3/squad.yaml.

Common

To specify the configuration for adapter learning or IA3 learning, set the run configurations to define the job specific config:

Copy
Copied!
            

run: name: ${.task_name}_${.model_train_name} time_limit: "04:00:00" dependency: "singleton" convert_name: convert_nemo model_train_name: gpt3_5b task_name: "squad" results_dir: ${base_results_dir}/${.model_train_name}/adapter_learning_${.task_name} # or ia3_learning

To specify the language model checkpoint to load and its definition, set the model configurations:

Copy
Copied!
            

model: language_model_path: ${base_results_dir}/${adapter_learning.run.model_train_name}/${adapter_learning.run.convert_name}/megatron_gpt.nemo # # or ia3_learning tensor_model_parallel_size: 1 pipeline_model_parallel_size: 1

Slurm

Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml:

Copy
Copied!
            

partition: null account: null exclusive: True gpus_per_task: 1 gpus_per_node: null mem: 0 overcommit: False job_name_prefix: "nemo-megatron-"

Example

To run only the adapter learning pipeline and not the data preparation, training, conversion, or another pipeline. set conf/config.yaml to:

Copy
Copied!
            

stages: - adapter_learning # or ia3_learning

Then enter

Copy
Copied!
            

python3 main.py


Base Command Platform

To run the adapter learning script on Base Command Platform, set the cluster_type configuration in conf/config.yaml to bcp. This configuration can also be overriden from the command line using hydra. This script must be launched in a multi-node job.

To run the adapter learning pipeline to adapter-learn a 5B GPT model converted checkpoint stored in /mount/results/gpt3_5b/convert_nemo, enter

Copy
Copied!
            

python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py adapter_learning=gpt3/squad \ stages=adapter_learning cluster_type=bcp \ launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \ adapter_learning.run.model_train_name=gpt3_5b \ adapter_learning.model.language_model_path=/mount/results/gpt3_5b/convert_nemo/results/megatron_gpt.nemo \ >> /results/adapter_learning_gpt3_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/adapter_learning_gpt3_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

To run the IA3 learning pipeline ro IA3-learn a 5B GPT model converted checkpoint stored in /mount/results/gpt3_5b/convert_nemo, enter

Copy
Copied!
            

python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py ia3_learning=gpt3/squad \ stages=ia3_learning cluster_type=bcp \ launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \ ia3_learning.run.model_train_name=gpt3_5b \ ia3_learning.model.language_model_path=/mount/results/gpt3_5b/convert_nemo/results/megatron_gpt.nemo \ >> /results/ia3_learning_gpt3_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/ia3_learning_gpt3_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

You must define the configuration used for the adapter learning by setting the adapter_learning configuration in conf/config.yaml to specify the adapter learning configuration file to be used. You must include the adapter_learning value in stages to run the adapter learning pipeline.

To perform adapter learning on a squad task for T5 models, set the adapter_learning configuration to t5/squad, which specifies the adapter learning configuration file as conf/adapter_learning/t5/squad.yaml.

You can define the configuration file used for IA3 learning in the same way by setting the ia3_learning configuration in conf/config.yaml to specify the ia3 learning configuration file. The ia3_learning value must be included in stages to run the IA3 learning pipeline.

To perform IA3 learning on a squad task for T5 models, set the ia3_learning configuration to t5/squad, which specifies the ia3 learning configuration file as conf/adapter_learning/t5/squad.yaml.

Common

To specify the configuration for adapter learning or IA3 learning, set the run configurations to define the job-specific configuration:

Copy
Copied!
            

run: name: ${.task_name}_${.model_train_name} time_limit: "04:00:00" dependency: "singleton" convert_name: convert_nemo model_train_name: t5_220m task_name: "squad" results_dir: ${base_results_dir}/${.model_train_name}/adapter_learning_${.task_name} # or ia3_learning

To specify the language model checkpoint to load and its definition, set the model configurations:

Copy
Copied!
            

model: language_model_path: ${base_results_dir}/${adapter_learning.run.model_train_name}/${adapter_learning.run.convert_name}/megatron_t5.nemo # or ia3_learning tensor_model_parallel_size: 1 pipeline_model_parallel_size: 1

Slurm

Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml:

Copy
Copied!
            

partition: null account: null exclusive: True gpus_per_task: 1 gpus_per_node: null mem: 0 overcommit: False job_name_prefix: "nemo-megatron-"

Example

To run only the adapter learning pipeline and not the data preparation, training, conversion, or another pipeline, set the stages section of conf/config.yaml to:

Copy
Copied!
            

stages: - adapter_learning # or ia3_learning

Then enter

Copy
Copied!
            

python3 main.py


Base Command Platform

To run the adapter learning script on Base Command Platform, set the cluster_type configuration in conf/config.yaml to bcp. This configuration can be overridden from the command line using hydra. This script must be launched in a multi-node job.

To run the adapter learning pipeline to adapter-learn a 220M T5 model converted checkpoint stored in /mount/results/t5_220m/convert_nemo, enter

Copy
Copied!
            

python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py adapter_learning=t5/squad \ stages=adapter_learning cluster_type=bcp \ launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \ adapter_learning.run.model_train_name=t5_220m \ adapter_learning.model.language_model_path=/mount/results/t5_220m/convert_nemo/results/megatron_t5.nemo \ >> /results/adapter_learning_t5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/adapter_learning_t5_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

To run the IA3 learning pipeline to IA3-learn a 220M T5 model converted checkpoint stored in /mount/results/t5_220m/convert_nemo, enter

Copy
Copied!
            

python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py ia3_learning=t5/squad \ stages=ia3_learning cluster_type=bcp \ launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \ ia3_learning.run.model_train_name=t5_220m \ ia3_learning.model.language_model_path=/mount/results/t5_220m/convert_nemo/results/megatron_t5.nemo \ >> /results/ia3_learning_t5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/ia3_learning_t5_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

Previous Model Prompt Learning
Next Model Evaluation
© Copyright 2023-2024, NVIDIA. Last updated on Feb 22, 2024.