Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Model Fine-Tuning
NVIDIA provides an easy-to-use tool to help you fine-tune the trained checkpoints on SQuAD for T5 models.
SQuAD Fine-Tuning
Set the fine_tuning
config in conf/config.yaml
to specify the fine-tuning configuration file.
Include the fine_tuning
value in stages
to run the fine-tuning pipeline.
To fine-tune the checkpoint on a squad
task, set
the fine_tuning
configuration to t5/squad
,
which specifies the fine-tuning configuration file as conf/fine_tuning/t5/squad.yaml
.
Modify the configurations in this file to adapt to different GLUE tasks and checkpoints in fine-tuning runs.
Adjust the fine-tuning hyperparameters to achieve the best accuracy for a specific GLUE task.
The provided hyperparameters are only
optimized for T5 220M model on a squad
task.
Common
To configure the tasks to be run for fine-tuning, set the run.task_name
configuration.
Use the other run
configurations to define the job-specific configuration:
run:
name: ${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
convert_name: convert_nemo
model_train_name: t5_220m
task_name: "squad"
results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}
Use the model
configuration to specify which model checkpoint to load and its definition:
model:
restore_from_path: ${base_results_dir}/${fine_tuning.run.model_train_name}/${fine_tuning.run.convert_name}/megatron_t5.nemo # Path to a trained T5 .nemo file
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the evaluation pipeline and not the data preparation,
training, conversion, or inference pipelines, set the stages
section of conf/config.yaml
to:
stages:
- fine_tuning
Then enter:
python3 main.py
Base Command Platform
To run the conversion script on Base Command Platform, set the
cluster_type
configuration in conf/config.yaml
to bcp
. You can
also override this configuration from the command line, using hydra. This script must be launched in a multi-node job.
To run the fine-tuning pipeline to fine-tune a 220M T5 model converted
checkpoint stored in /mount/results/t5_220m/convert_nemo
, enter:
python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py fine_tuning=t5/squad stages=fine_tuning \
cluster_type=bcp \
launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \
fine_tuning.run.model_train_name=t5_220m \
fine_tuning.model.restore_from_path=/mount/results/t5_220m/convert_nemo/results/megatron_t5.nemo \
>> /results/finetune_t5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/data_gpt3_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
Fine-tuning on Custom Tasks
NVIDIA supports fine-tuning on custom downstream tasks in T5 and mT5. To run a benchmark on your own dataset, split the original dataset into two files, i.e. a TXTfile corresponding to the source (context) data, and a TXT file corresponding to the target data. Each pair of corresponding lines of these two files forms a fine-tuning sample.
Custom fine-tuning configuration files are in
conf/fine_tuning/t5/custom_task.yaml
for T5 models and
conf/fine_tuning/mt5/custom_task.yaml
for mT5 models. The essential
configurations are listed below. The dataset paths and
preferred benchmark metrics must be specified.
data:
train_ds:
src_file_name: ??? # Path to the txt file corresponding to the source data.
tgt_file_name: ??? # Path to the txt file corresponding to the target data.
validation_ds:
src_file_name: ??? # Path to the txt file corresponding to the source data.
tgt_file_name: ??? # Path to the txt file corresponding to the target data.
metric:
name: "exact_string_match" # Name of the evaluation metric to use.
average: null # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported.
num_classes: null
To submit a custom task job, follow the instructions in section SQuAD Fine-Tuning.