Model Evaluation

You can run the evaluation scripts on a fine-tuned checkpoint to evaluate the capabilities of a fine-tuned mT5 model on XQuAD. Do this only with a fine-tuned checkpoint in .nemo format. Usually the tasks of fine-tuning and evaluation are the same.

You must define the configuration used for the evaluation by setting the evaluation configuration in conf/config.yaml to specify the evaluation configuration file to be used. Set the configuration to t5/xquad, which specifies the configuration file as conf/evaluation/mt5/xquad.yaml. You can modify the configurations to adapt different evaluation tasks and checkpoints in evaluation runs. For Base Command Platform, override all of these configurations from the command line.

You must include the evaluation configuration in stages to run the adapter learning pipeline.

Common

To configure the tasks to be run for evaluation, set the run.task_name configuration. Set the other run configurations to define the job-specific configuration:

run:
    name: eval_${.task_name}_${.model_train_name}
    time_limit: "04:00:00"
    dependency: "singleton"
    model_train_name: mt5_390m
    task_name: "xquad"
    fine_tuning_results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}
    results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_eval

To specify the fine-tuned checkpoint to be loaded and its definition, set the model configurations:

model:
    restore_from_path: ${evaluation.run.fine_tuning_results_dir}/checkpoints/megatron_mt5_xquad.nemo # Path to a finetuned T5 .nemo file
    tensor_model_parallel_size: 1
    pipeline_model_parallel_size: 1

Slurm

Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml:

partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"

Example

To run only the evaluation pipeline and not the data preparation, training, conversion, or inference pipelines. set conf/config.yaml to:

stages:
  - evaluation

Then enter:

python3 main.py

Base Command Platform

To run the evaluation script on Base Command Platform, set the cluster_type configuration in conf/config.yaml to bcp. This config can be overidden from the command line using hydra. This script must be launched in a multi-node job.

To run the evaluation pipeline to evaluate a 390M mT5 model which has been fine-tuned on xquad task and checkpoint stored in /mount/results/mt5_390m/xquad/results/checkpoints, enter:

python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py evaluation=mt5/xquad \
stages=<evaluation> cluster_type=bcp launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.model_train_name=mt5_390m \
evaluation.model.restore_from_path=/mount/results/mt5_390m/xquad/results/checkpoints/megatron_mt5_xquad.nemo \
>> /results/eval_mt5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/eval_mt5_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.