Model Fine-Tuning

NVIDIA provides an easy-to-use tool to help you fine-tune the trained checkpoints on SQuAD for T5 models.

Set the fine_tuning config in conf/config.yaml to specify the fine-tuning configuration file. Include the fine_tuning value in stages to run the fine-tuning pipeline.

To fine-tune the checkpoint on a squad task, set the fine_tuning configuration to t5/squad, which specifies the fine-tuning configuration file as conf/fine_tuning/t5/squad.yaml. Modify the configurations in this file to adapt to different GLUE tasks and checkpoints in fine-tuning runs. Adjust the fine-tuning hyperparameters to achieve the best accuracy for a specific GLUE task. The provided hyperparameters are only optimized for T5 220M model on a squad task.

Common

To configure the tasks to be run for fine-tuning, set the run.task_name configuration. Use the other run configurations to define the job-specific configuration:

Copy
Copied!
            

run: name: ${.task_name}_${.model_train_name} time_limit: "04:00:00" dependency: "singleton" convert_name: convert_nemo model_train_name: t5_220m task_name: "squad" results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}

Use the model configuration to specify which model checkpoint to load and its definition:

Copy
Copied!
            

model: restore_from_path: ${base_results_dir}/${fine_tuning.run.model_train_name}/${fine_tuning.run.convert_name}/megatron_t5.nemo # Path to a trained T5 .nemo file tensor_model_parallel_size: 1 pipeline_model_parallel_size: 1

Slurm

Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml:

Copy
Copied!
            

partition: null account: null exclusive: True gpus_per_task: null gpus_per_node: 8 mem: 0 overcommit: False job_name_prefix: "nemo-megatron-"

Example

To run only the evaluation pipeline and not the data preparation, training, conversion, or inference pipelines, set the stages section of conf/config.yaml to:

Copy
Copied!
            

stages: - fine_tuning

Then enter:

Copy
Copied!
            

python3 main.py

Base Command Platform

To run the conversion script on Base Command Platform, set the cluster_type configuration in conf/config.yaml to bcp. You can also override this configuration from the command line, using hydra. This script must be launched in a multi-node job.

To run the fine-tuning pipeline to fine-tune a 220M T5 model converted checkpoint stored in /mount/results/t5_220m/convert_nemo, enter:

Copy
Copied!
            

python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py fine_tuning=t5/squad stages=fine_tuning \ cluster_type=bcp \ launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data base_results_dir=/mount/results \ fine_tuning.run.model_train_name=t5_220m \ fine_tuning.model.restore_from_path=/mount/results/t5_220m/convert_nemo/results/megatron_t5.nemo \ >> /results/finetune_t5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/data_gpt3_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

NVIDIA supports fine-tuning on custom downstream tasks in T5 and mT5. To run a benchmark on your own dataset, split the original dataset into two files, i.e. a TXTfile corresponding to the source (context) data, and a TXT file corresponding to the target data. Each pair of corresponding lines of these two files forms a fine-tuning sample.

Custom fine-tuning configuration files are in conf/fine_tuning/t5/custom_task.yaml for T5 models and conf/fine_tuning/mt5/custom_task.yaml for mT5 models. The essential configurations are listed below. The dataset paths and preferred benchmark metrics must be specified.

Copy
Copied!
            

data: train_ds: src_file_name: ??? # Path to the txt file corresponding to the source data. tgt_file_name: ??? # Path to the txt file corresponding to the target data. validation_ds: src_file_name: ??? # Path to the txt file corresponding to the source data. tgt_file_name: ??? # Path to the txt file corresponding to the target data. metric: name: "exact_string_match" # Name of the evaluation metric to use. average: null # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. num_classes: null

To submit a custom task job, follow the instructions in section SQuAD Fine-Tuning.

Previous PEFT Training and Inference
Next T5 Results
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.