Model Evaluation

You can run the evaluation scripts on a fine-tuned checkpoint to evaluate the capabilities of a fine-tuned mT5 model on XQuAD. Do this only with a fine-tuned checkpoint in .nemo format. Usually the tasks of fine-tuning and evaluation are the same.

You must define the configuration used for the evaluation by setting the evaluation configuration in conf/config.yaml to specify the evaluation configuration file to be used. Set the configuration to t5/xquad, which specifies the configuration file as conf/evaluation/mt5/xquad.yaml. You can modify the configurations to adapt different evaluation tasks and checkpoints in evaluation runs. For Base Command Platform, override all of these configurations from the command line.

You must include the evaluation configuration in stages to run the adapter learning pipeline.

To configure the tasks to be run for evaluation, set the run.task_name configuration. Set the other run configurations to define the job-specific configuration:

Copy
Copied!
            

run: name: eval_${.task_name}_${.model_train_name} time_limit: "04:00:00" dependency: "singleton" model_train_name: mt5_390m task_name: "xquad" fine_tuning_results_dir: ${base_results_dir}/${.model_train_name}/${.task_name} results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_eval

To specify the fine-tuned checkpoint to be loaded and its definition, set the model configurations:

Copy
Copied!
            

model: restore_from_path: ${evaluation.run.fine_tuning_results_dir}/checkpoints/megatron_mt5_xquad.nemo # Path to a finetuned T5 .nemo file tensor_model_parallel_size: 1 pipeline_model_parallel_size: 1

Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml:

Copy
Copied!
            

partition: null account: null exclusive: True gpus_per_task: null gpus_per_node: 8 mem: 0 overcommit: False job_name_prefix: "nemo-megatron-"

Example

To run only the evaluation pipeline and not the data preparation, training, conversion, or inference pipelines. set conf/config.yaml to:

Copy
Copied!
            

stages: - evaluation

Then enter:

Copy
Copied!
            

python3 main.py

To run the evaluation script on Base Command Platform, set the cluster_type configuration in conf/config.yaml to bcp. This config can be overidden from the command line using hydra. This script must be launched in a multi-node job.

To run the evaluation pipeline to evaluate a 390M mT5 model which has been fine-tuned on xquad task and checkpoint stored in /mount/results/mt5_390m/xquad/results/checkpoints, enter:

Copy
Copied!
            

python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py evaluation=mt5/xquad \ stages=<evaluation> cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data \ base_results_dir=/mount/results evaluation.run.model_train_name=mt5_390m \ evaluation.model.restore_from_path=/mount/results/mt5_390m/xquad/results/checkpoints/megatron_mt5_xquad.nemo \ >> /results/eval_mt5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/eval_mt5_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

Previous Checkpoint Conversion
Next PEFT Training and Inference
© Copyright 2023-2024, NVIDIA. Last updated on Apr 25, 2024.