The NeMo framework supports Adapter Learning and Infused Adapter by Inhibiting and Amplifying Inner Activations (IA3) learning. Both methods are parameter-efficient alternatives to fine-tuning pretrained language models. The NVIDIA NeMo implementation lets you use one pretrained GPT or T5 models on many downstream tasks without tuning the model’s full set of parameters. Because neither method modifies the original model parameters, they also avoid the cartographic forgetting issues often encountered when fine-tuning models.
Unlike P‑tuning and prompt-tuning, Adapter Learning and IA3 do not insert virtual prompts into the input. Adapter Learning introduces feedforward layers within the core transformer architecture which are updated for specific downstream tasks. IA3 adds even fewer parameters; they simply scale the hidden representations in the transformer layer. These parameters can be trained for specific downstream tasks.
The NVIDIA implementation of Adapter Learning for GPT3 and T5 is based on Parameter-Efficient Transfer Learning for NLP.
The NVIDIA implementation of IA3 is based on Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning.
Note that the paper proposes a recipe called t-few which introduces an unlikelihood loss function and a continued training procedure. The NVIDIA IA3 implementation does not support these additions, and only focuses on the core architectural change.
Adapter Learning and IA3 support SQuAD v1.1 benchmarks. With the default adapter learning and IA3 configuration file, NVIDIA scripts download and preprocess the original SQuAD v1.1 dataset to adapter learning and IA3 dataset format (the same format as for prompt learning). You can use your own task dataset as well.
You specify the configuration for adapter learning by setting the adapter_learning
configuration in conf/config.yaml
to specify the adapter learning configuration file. The adapter_learning
value must be included in stages
to run the adapter learning pipeline.
To run adapter learning on a squad
task, set the adapter_learning
configuration to gpt3/squad
, which specifies the adapter learning configuration file as conf/adapter_learning/gpt3/squad.yaml
.
You can configure IA3 learning the same way, by setting the ia3_learning
configuration in conf/config.yaml
to specify the IA3 learning configuration file. The ia3_learning
value must be included in stages
to run the IA3 learning pipeline.
To run IA3 learning on a squad
task, set the ia3_learning
configuration to gpt3/squad
, which specifies the IA3 learning configuration file as conf/ia3_learning/gpt3/squad.yaml
.
Common
To specify the configuration for adapter learning or IA3 learning, set the run
configurations to define the job specific config:
run:
name: ${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
convert_name: convert_nemo
model_train_name: gpt3_5b
task_name: "squad"
results_dir: ${base_results_dir}/${.model_train_name}/adapter_learning_${.task_name} # or ia3_learning
To specify the language model checkpoint to load and its definition, set the model
configurations:
model:
language_model_path: ${base_results_dir}/${adapter_learning.run.model_train_name}/${adapter_learning.run.convert_name}/megatron_gpt.nemo # # or ia3_learning
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: 1
gpus_per_node: null
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the adapter learning pipeline and not the data preparation,
training, conversion, or another pipeline. set conf/config.yaml
to:
stages:
- adapter_learning # or ia3_learning
Then enter
python3 main.py
Base Command Platform
To run the adapter learning script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
. This configuration can also be overriden from the command line using hydra. This script must be launched in a multi-node job.
To run the adapter learning pipeline to adapter-learn a 5B GPT model converted checkpoint stored in /mount/results/gpt3_5b/convert_nemo
, enter
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py adapter_learning=gpt3/squad \
stages=adapter_learning cluster_type=bcp \
launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \
adapter_learning.run.model_train_name=gpt3_5b \
adapter_learning.model.language_model_path=/mount/results/gpt3_5b/convert_nemo/results/megatron_gpt.nemo \
>> /results/adapter_learning_gpt3_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/adapter_learning_gpt3_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
To run the IA3 learning pipeline ro IA3-learn a 5B GPT model converted checkpoint stored in /mount/results/gpt3_5b/convert_nemo
, enter
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py ia3_learning=gpt3/squad \
stages=ia3_learning cluster_type=bcp \
launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \
ia3_learning.run.model_train_name=gpt3_5b \
ia3_learning.model.language_model_path=/mount/results/gpt3_5b/convert_nemo/results/megatron_gpt.nemo \
>> /results/ia3_learning_gpt3_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/ia3_learning_gpt3_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
You must define the configuration used for the adapter learning by setting the adapter_learning
configuration in conf/config.yaml
to specify the adapter learning configuration file to be used. You must include the adapter_learning
value in stages
to run the adapter learning pipeline.
To perform adapter learning on a squad
task for T5 models, set the adapter_learning
configuration to t5/squad
, which specifies the adapter learning configuration file as conf/adapter_learning/t5/squad.yaml
.
You can define the configuration file used for IA3 learning in the same way by setting the ia3_learning
configuration in conf/config.yaml
to specify the ia3 learning configuration file. The ia3_learning
value must be included in stages
to run the IA3 learning pipeline.
To perform IA3 learning on a squad
task for T5 models, set the ia3_learning
configuration to t5/squad
, which specifies the ia3 learning configuration file as conf/adapter_learning/t5/squad.yaml
.
Common
To specify the configuration for adapter learning or IA3 learning, set the run
configurations to define the job-specific configuration:
run:
name: ${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
convert_name: convert_nemo
model_train_name: t5_220m
task_name: "squad"
results_dir: ${base_results_dir}/${.model_train_name}/adapter_learning_${.task_name} # or ia3_learning
To specify the language model checkpoint to load and its definition, set the model
configurations:
model:
language_model_path: ${base_results_dir}/${adapter_learning.run.model_train_name}/${adapter_learning.run.convert_name}/megatron_t5.nemo # or ia3_learning
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: 1
gpus_per_node: null
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the adapter learning pipeline and not the data preparation, training, conversion, or another pipeline, set the stages
section of conf/config.yaml
to:
stages:
- adapter_learning # or ia3_learning
Then enter
python3 main.py
Base Command Platform
To run the adapter learning script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
. This configuration can be overridden from the command line using hydra. This script must be launched in a multi-node job.
To run the adapter learning pipeline to adapter-learn a 220M T5 model converted checkpoint stored in /mount/results/t5_220m/convert_nemo
, enter
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py adapter_learning=t5/squad \
stages=adapter_learning cluster_type=bcp \
launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \
adapter_learning.run.model_train_name=t5_220m \
adapter_learning.model.language_model_path=/mount/results/t5_220m/convert_nemo/results/megatron_t5.nemo \
>> /results/adapter_learning_t5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/adapter_learning_t5_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
To run the IA3 learning pipeline to IA3-learn a 220M T5 model converted
checkpoint stored in /mount/results/t5_220m/convert_nemo
, enter
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py ia3_learning=t5/squad \
stages=ia3_learning cluster_type=bcp \
launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \
ia3_learning.run.model_train_name=t5_220m \
ia3_learning.model.language_model_path=/mount/results/t5_220m/convert_nemo/results/megatron_t5.nemo \
>> /results/ia3_learning_t5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/ia3_learning_t5_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.