Model Evaluation
You can run the evaluation scripts on a fine-tuned checkpoint to evaluate the capabilities of a fine-tuned mT5 model on XQuAD.
Do this only with a fine-tuned checkpoint in .nemo
format.
Usually the tasks of fine-tuning and evaluation are the same.
You must define the configuration used for the evaluation by setting the evaluation
configuration in conf/config.yaml
to specify the evaluation configuration file to be used.
Set the configuration to t5/xquad
, which specifies the configuration file as conf/evaluation/mt5/xquad.yaml
.
You can modify the configurations to adapt different evaluation tasks and checkpoints in evaluation runs.
For Base Command Platform, override all of these configurations from the command line.
You must include the evaluation
configuration in stages
to run the adapter learning pipeline.
Common
To configure the tasks to be run for evaluation, set the run.task_name
configuration.
Set the other run
configurations to define the job-specific configuration:
run:
name: eval_${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
model_train_name: mt5_390m
task_name: "xquad"
fine_tuning_results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}
results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_eval
To specify the fine-tuned checkpoint to be loaded and its definition, set
the model
configurations:
model:
restore_from_path: ${evaluation.run.fine_tuning_results_dir}/checkpoints/megatron_mt5_xquad.nemo # Path to a finetuned T5 .nemo file
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the evaluation pipeline and not the data preparation,
training, conversion, or inference pipelines. set conf/config.yaml
to:
stages:
- evaluation
Then enter:
python3 main.py
Base Command Platform
To run the evaluation script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
.
This config can be overidden from the command line using hydra.
This script must be launched in a multi-node job.
To run the evaluation pipeline to evaluate a 390M mT5 model which has
been fine-tuned on xquad
task and checkpoint stored in
/mount/results/mt5_390m/xquad/results/checkpoints
, enter:
python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py evaluation=mt5/xquad \
stages=<evaluation> cluster_type=bcp launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.model_train_name=mt5_390m \
evaluation.model.restore_from_path=/mount/results/mt5_390m/xquad/results/checkpoints/megatron_mt5_xquad.nemo \
>> /results/eval_mt5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/eval_mt5_log.txt
, which you can download from NGC.
Any other required configuration may be added to modify the command’s behavior.