Model Evaluation

You can run the evaluation scripts on a fine-tuned checkpoint to evaluate the capabilities of a fine-tuned T5 model on SQuAD. Do this only with a fine-tuned checkpoint in .nemo format.

Base Model Evaluation

You must define the configuration used for the evaluation by setting the evaluation configuration in conf/config.yaml to specify the evaluation config file to be used. Set the configuration to t5/squad, which specifies the configuration file as conf/evaluation/t5/squad.yaml. You can modify the config to adapt different evaluation tasks and checkpoints in evaluation runs. For Base Command Platform, override all of these configurations from the command line.

You must include the evaluation value in stages to run the adapter learning pipeline.

Common

To specify the tasks to be performed in evaluation, set the run.task_name configuration. Set the other run configurations to define the job-specific configuration:

run:
    name: eval_${.task_name}_${.model_train_name}
    time_limit: "04:00:00"
    dependency: "singleton"
    model_train_name: t5_220m
    task_name: "squad"
    fine_tuning_results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}
    results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_eval

To specify the fine-tuned checkpoint to load and its definition, set the model configuration:

model:
    restore_from_path: ${evaluation.run.fine_tuning_results_dir}/checkpoints/megatron_t5_glue.nemo # Path to a finetuned T5 .nemo file
    tensor_model_parallel_size: 1
    pipeline_model_parallel_size: 1

Slurm

Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml:

partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"

Example

To run only the evaluation pipeline and not the data preparation, training, conversion, or inference pipelines, set the stages section of conf/config.yaml to:

stages:
  - evaluation

Then enter:

python3 main.py

Base Command Platform

To run the evaluation script on Base Command Platform, set the cluster_type configuration in conf/config.yaml to bcp. You can also override this configuration from the command line using hydra. This script must be launched in a multi-node job.

To run the evaluation pipeline to evaluate a 220M T5 model which has been fine-tuned on a squad task and checkpointed in /mount/results/t5_220m/squad/results/checkpoints, enter:

python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py evaluation=t5/squad \
stages=<evaluation> \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts  data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.model_train_name=t5_220m \
evaluation.model.restore_from_path=/mount/results/t5_220m/squad/results/checkpoints/megatron_t5_glue.nemo \
>> /results/eval_t5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/eval_t5_log.txt, which you can download from NGC. You may add any other configuration required to modify the command’s behavior.

Prompt-Learned T5 and mT5 Evaluation

NVIDIA provides a simple tool to help evaluate prompt-learned T5 and mT5 checkpoints. You can evaluate the capabilities of prompt-learned models on a customized prompt learning test dataset.

NVIDIA provides an example which evaluates a checkpoint that went through prompt learning on SQuAD v1.1, on the SQuAD v1.1 test dataset created in prompt learning step.

Set the evaluation configuration in conf/config.yaml, which specifies the pathname of the evaluation configuration file. For T5 models, set evaluation to prompt_t5/squad.yaml, which specifies the evaluation configuration file as conf/evaluation/prompt_t5/squad.yaml. For mT5 models, set it to prompt_mt5/squad.yaml, which specifies the file as conf/evaluation/prompt_mt5/squad.yaml.

The evaluation value must be included in stages to run the evaluation pipeline.

The configurations can be modified to adapt to different evaluation tasks and checkpoints in evaluation runs. For Base Command Platform, all configurations must be overriden from the command line.

Common

Set the run.tasks configuration to prompt. Set the other run configuration to define the job-specific configuration:

run:
  name: eval_${.task_name}_${.model_train_name}
  time_limit: "04:00:00"
  dependency: "singleton"
  model_train_name: t5_220m # or mt5_390m
  task_name: "squad"
  prompt_learning_dir: ${base_results_dir}/${.model_train_name}/prompt_learning_squad # assume prompt learning was on squad task
  results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_eval

To specify the model checkpoint to be loaded and the prompt learning test dataset to be evaluated, set the following configurations:

data:
  test_ds:
    - ${data_dir}/prompt_data/v1.1/squad_test.jsonl
  num_workers: 4
  global_batch_size: 16
  micro_batch_size: 16
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
pipeline_model_parallel_split_rank: ${divide_floor:${.pipeline_model_parallel_size}, 2}
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
language_model_path: ${base_results_dir}/${evaluation.run.model_train_name}/convert_nemo/results/megatron_t5.nemo  # or megatron_mt5.nemo
virtual_prompt_model_file: ${evaluation.run.prompt_learning_dir}/results/megatron_t5_prompt.nemo # or megatron_mt5_prompt.nemo

Slurm

Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml:

partition: null
account: null
exclusive: True
gpus_per_task: 1
gpus_per_node: null
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"

Example

To run only the evaluation pipeline and not the data preparation, training, conversion, or inference pipelines, set the stages section of conf/config.yaml to:

stages:
  - evaluation

Then enter:

python3 main.py

Base Command Platform

To run the evaluation script on Base Command Platform, set the cluster_type configuration in conf/config.yaml to bcp. This config can be overidden from the command line using hydra. This script must be launched in a multi-node job.

To run the evaluation pipeline to evaluate a prompt-learned 220M T5 model checkpoint stored in /mount/results/t5_220m/prompt_learning_squad, enter:

python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=prompt_t5/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/t5_220m/eval_prompt_squad \
evaluation.model.virtual_prompt_model_file=/mount/results/t5_220m/prompt_learning_squad/results/megatron_t5_prompt.nemo \
>> /results/eval_prompt_t5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/eval_prompt_t5_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

To run the evaluation pipeline to evaluate a prompt-learned 390M mT5 model checkpoint stored in /mount/results/mt5_390m/prompt_learning_squad, enter:

python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=prompt_mt5/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/mt5_390m/eval_prompt_squad \
evaluation.model.virtual_prompt_model_file=/mount/results/mt5_390m/prompt_learning_squad/results/megatron_mt5_prompt.nemo \
>> /results/eval_prompt_mt5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/eval_prompt_mt5_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

Adapter-Learned and IA3-Learned T5 Evaluation

Set the evaluation configuration in conf/config.yaml, which specifies the pathname of the evaluation configuration file. For an adapter-learned T5 model, set the evaluation configuration to adapter_t5/squad.yml, which specifies the evaluation configuration file as conf/evaluation/adapter_t5/squad.yaml. For an IA3-learned model, set the configuration to ia3_t5/squad.yml, which specifies the evaluation configuration file as conf/evaluation/ia3_t5/squad.yaml.

The evaluation configuration must be included in stages to run the evaluation pipeline.

The configurations can be modified to adapt to different evaluation tasks and checkpoints in evaluation runs. For Base Command Platform, all configurations must be overriden from the command line.

Common

To specify the configuration, set the run configurations to define the job-specific configuration:

run:
  name: eval_${.task_name}_${.model_train_name}
  time_limit: "04:00:00"
  dependency: "singleton"
  model_train_name: t5_220m
  task_name: "squad"
  adapter_learning_dir: ${base_results_dir}/${.model_train_name}/adapter_learning_squad # or ia3_learning_squad
  results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_eval

To specify the model checkpoint to be loaded and the test dataset to be evaluated, set the following configurations:

data:
  test_ds:
    - ${data_dir}/prompt_data/v1.1/squad_test.jsonl
  num_workers: 4
  global_batch_size: 16
  micro_batch_size: 16
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
pipeline_model_parallel_split_rank: ${divide_floor:${.pipeline_model_parallel_size}, 2}
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
language_model_path: ${base_results_dir}/${evaluation.run.model_train_name}/convert_nemo/results/megatron_t5.nemo
adapter_model_file: ${evaluation.run.adapter_learning_dir}/results/megatron_t5_adapter.nemo # or megatron_t5_ia3.nemo

Slurm

Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml:

partition: null
account: null
exclusive: True
gpus_per_task: 1
gpus_per_node: null
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"

Example

To run only the evaluation pipeline and not the data preparation, training, conversion, or inference pipelines, set the stages section of conf/config.yaml to:

stages:
  - evaluation

Then enter:

python3 main.py

Base Command Platform

To run the evaluation script on Base Command Platform, set the cluster_type configuration in conf/config.yaml to bcp. This config can be overidden from the command line using hydra. This script must be launched in a multi-node job.

To run the evaluation pipeline to evaluate an adapter learned 220M T5 model checkpoint stored in /mount/results/t5_220m/adapter_learning_squad, enter:

python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=adapter_t5/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/t5_220m/eval_adapter_squad \
evaluation.model.adapter_model_file=/mount/results/t5_220m/adapter_learning_squad/results/megatron_t5_adapter.nemo \
>> /results/eval_adapter_t5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/eval_adapter_t5_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.

To run the evaluation pipeline to evaluate an IA3-learned 220M T5 model checkpoint stored in /mount/results/t5_220m/ia3_learning_squad, enter:

python3 /opt/NeMo-Framework-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=ia3_t5/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Framework-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/t5_220m/eval_ia3_squad \
evaluation.model.adapter_model_file=/mount/results/t5_220m/ia3_learning_squad/results/megatron_t5_ia3.nemo \
>> /results/eval_ia3_t5_log.txt 2>&1

The command above assumes that you mounted the data workspace in /mount/data, and the results workspace in /mount/results. stdout and stderr are redirected to the file /results/eval_ia3_t5_log.txt, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.