NVIDIA provides a simple tool to help evaluate trained checkpoints. You can evaluate the capabilities of the GPT model on the following ZeroShot downstream evaluation tasks:
lambada
boolq
race
piqa
hellaswag
winogrande
wikitext2
wikitext103
You must perform model evaluation using a training checkpoint
(.ckpt
format), not a converted checkpoint (.nemo
format).
You must define the configuration used for the evaluation by setting the evaluation
configuration in conf/config.yaml
to specify the evaluation configuration file to be used.
Set the configuration to gpt3/evaluate_all
, which specifies the configuration file as conf/evaluation/gpt3/evaluate_all.yaml
.
You can modify the configuration to adapt different evaluation tasks and checkpoints in evaluation runs.
For Base Command Platform, override all of these configuration from the command line.
You must include the evaluation
value in stages
to run the adapter learning pipeline.
Common
To configure the tasks to be run for evaluation, set the run.tasks
configuration. Use the other run
configurations to define the job-specific configuration:
run:
name: ${.eval_name}_${.model_train_name}
time_limit: "4:00:00"
nodes: ${divide_ceil:${evaluation.model.model_parallel_size}, 8} # 8 gpus per node
ntasks_per_node: ${divide_ceil:${evaluation.model.model_parallel_size}, ${.nodes}}
eval_name: eval_all
model_train_name: gpt3_5b
train_dir: ${base_results_dir}/${.model_train_name}
tasks: all_tasks # supported: lambada, boolq, race, piqa, hellaswag, winogrande, wikitext2, wikitext103 OR all_tasks
results_dir: ${base_results_dir}/${.model_train_name}/${.eval_name}
To specify the model checkpoint to load and its definition, use the
model
configuration:
model:
model_type: nemo-gpt3
checkpoint_folder: ${evaluation.run.train_dir}/results/checkpoints
checkpoint_name: latest # latest OR name pattern of a checkpoint (e.g. megatron_gpt-*last.ckpt)
hparams_file: ${evaluation.run.train_dir}/results/hparams.yaml
tensor_model_parallel_size: 2 #1 for 126m, 2 for 5b, 8 for 20b
pipeline_model_parallel_size: 1
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
precision: bf16 # must match training precision - 32, 16 or bf16
eval_batch_size: 4
vocab_file: ${data_dir}/bpe/vocab.json
merge_file: ${data_dir}/bpe/merges.txt
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the evaluation pipeline and not the data preparation,
training, conversion. or inference pipelines, set the stages
section of conf/config.yaml
to:
stages:
- evaluation
Then enter:
python3 main.py
Base Command Platform
To run the evaluation script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
.
You can also override this configuration from the command line using hydra.
This script must be launched in a multi-node job.
To run the evaluation pipeline to evaluate a 126M GPT model checkpoint
stored in /mount/results/gpt3_126m/checkpoints
, enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py stages=<evaluation> \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data/the_pile_gpt3 \
base_results_dir=/mount/results evaluation.model.vocab_file=/mount/data/bpe/vocab.json \
evaluation.model.merge_file=/mount/data/bpe/merges.txt evaluation.run.results_dir=/mount/results/gpt3_126m/evaluation \
evaluation.model.checkpoint_folder=/mount/results/gpt3_126m/results/checkpoints evaluation.model.eval_batch_size=16 \
evaluation.model.tensor_model_parallel_size=1 \
>> /results/eval_gpt3_log.txt 2>&1
Kubernetes
Set the configuration for a Slurm cluster in conf/cluster/k8s.yaml
:
pull_secret: null # Kubernetes secret for the container registry to pull private containers.
shm_size: 512Gi # Amount of system memory to allocate in Pods. Should end in "Gi" for gigabytes.
nfs_server: null # Hostname or IP address for the NFS server where data is stored.
nfs_path: null # Path to store data in the NFS server.
ib_resource_name: "nvidia.com/hostdev" # Specify the resource name for IB devices according to kubernetes, such as "nvidia.com/hostdev" for Mellanox IB adapters.
ib_count: "8" # Specify the number of IB devices to include per node in each pod.
Example
Set the cluster
and cluster_type
settings to k8s
in conf/config.yaml
.
To run only the evaluation pipeline and not the data preparation,
training, conversion. or inference pipelines, set the stages
section of conf/config.yaml
to:
stages:
- evaluation
Then enter:
python3 main.py
This will launch a Helm chart based on the evaluation configurations which will
spawn a pod to evaluate the specified model. The pod can be viewed with
kubectl get pods
and the logs can be read with kubectl logs <pod name>
.
You can run the evaluation scripts on a fine-tuned checkpoint to evaluate the capabilities of a fine-tuned T5 model on SQuAD. Do this only with a fine-tuned checkpoint in .nemo
format.
You must define the configuration used for the evaluation by setting the evaluation
configuration in conf/config.yaml
to specify the evaluation config file to be used.
Set the configuration to t5/squad
, which specifies the configuration file as conf/evaluation/t5/squad.yaml
.
You can modify the config to adapt different evaluation tasks and checkpoints in evaluation runs.
For Base Command Platform, override all of these configurations from the command line.
You must include the evaluation
value in stages
to run the adapter learning pipeline.
Common
To specify the tasks to be performed in evaluation, set the run.task_name
configuration.
Set the other run
configurations to define the job-specific configuration:
run:
name: eval_${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
model_train_name: t5_220m
task_name: "squad"
fine_tuning_results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}
results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_eval
To specify the fine-tuned checkpoint to load and its definition, set
the model
configuration:
model:
restore_from_path: ${evaluation.run.fine_tuning_results_dir}/checkpoints/megatron_t5_glue.nemo # Path to a finetuned T5 .nemo file
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the evaluation pipeline and not the data preparation,
training, conversion, or inference pipelines, set the stages
section of conf/config.yaml
to:
stages:
- evaluation
Then enter:
python3 main.py
Base Command Platform
To run the evaluation script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
.
You can also override this configuration from the command line using hydra.
This script must be launched in a multi-node job.
To run the evaluation pipeline to evaluate a 220M T5 model which has
been fine-tuned on a squad
task and checkpointed in
/mount/results/t5_220m/squad/results/checkpoints
, enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py evaluation=t5/squad \
stages=<evaluation> \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.model_train_name=t5_220m \
evaluation.model.restore_from_path=/mount/results/t5_220m/squad/results/checkpoints/megatron_t5_glue.nemo \
>> /results/eval_t5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/eval_t5_log.txt
, which you can download from NGC.
You may add any other configuration required to modify the command’s behavior.
You can run the evaluation scripts on a fine-tuned checkpoint to evaluate the capabilities of a fine-tuned mT5 model on XQuAD.
Do this only with a fine-tuned checkpoint in .nemo
format.
Usually the tasks of fine-tuning and evaluation are the same.
You must define the configuration used for the evaluation by setting the evaluation
configuration in conf/config.yaml
to specify the evaluation configuration file to be used.
Set the configuration to t5/xquad
, which specifies the configuration file as conf/evaluation/mt5/xquad.yaml
.
You can modify the configurations to adapt different evaluation tasks and checkpoints in evaluation runs.
For Base Command Platform, override all of these configurations from the command line.
You must include the evaluation
configuration in stages
to run the adapter learning pipeline.
Common
To configure the tasks to be run for evaluation, set the run.task_name
configuration.
Set the other run
configurations to define the job-specific configuration:
run:
name: eval_${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
model_train_name: mt5_390m
task_name: "xquad"
fine_tuning_results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}
results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_eval
To specify the fine-tuned checkpoint to be loaded and its definition, set
the model
configurations:
model:
restore_from_path: ${evaluation.run.fine_tuning_results_dir}/checkpoints/megatron_mt5_xquad.nemo # Path to a finetuned T5 .nemo file
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the evaluation pipeline and not the data preparation,
training, conversion, or inference pipelines. set conf/config.yaml
to:
stages:
- evaluation
Then enter:
python3 main.py
Base Command Platform
To run the evaluation script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
.
This config can be overidden from the command line using hydra.
This script must be launched in a multi-node job.
To run the evaluation pipeline to evaluate a 390M mT5 model which has
been fine-tuned on xquad
task and checkpoint stored in
/mount/results/mt5_390m/xquad/results/checkpoints
, enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py evaluation=mt5/xquad \
stages=<evaluation> cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.model_train_name=mt5_390m \
evaluation.model.restore_from_path=/mount/results/mt5_390m/xquad/results/checkpoints/megatron_mt5_xquad.nemo \
>> /results/eval_mt5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/eval_mt5_log.txt
, which you can download from NGC.
Any other required configuration may be added to modify the command’s behavior.
NVIDIA provides a simple tool to help evaluate the prompt-learned GPT checkpoints. You can evaluate the capabilities of a prompt-learned GPT model on a customized prompt learning test dataset.
NVIDIA provides an example which evaluates a checkpoint that went through prompt learning on SQuAD v1.1, on the SQuAD v1.1 test dataset created in prompt learning step.
You must define the configuration used for the evaluation by setting the evaluation
configuration in conf/config.yaml
to specify the evaluation configuration file to be used.
Set the configuration to prompt_gpt3/squad.yaml
, which specifies the configuration file as conf/evaluation/prompt_gpt3/squad.yaml
.
The configurations can be modified to adapt to different evaluation tasks and checkpoints in evaluation runs.
For Base Command Platform, override all of these configurations from the command line.
You must include the evaluation
value in stages
to run the adapter learning pipeline.
Common
Set the run.tasks
configuration to prompt
.
Set the other run
configurations to define the job-specific configuration:
run:
name: ${.eval_name}_${.model_train_name}
time_limit: "4:00:00"
nodes: ${divide_ceil:${evaluation.model.model_parallel_size}, 8} # 8 gpus per node
ntasks_per_node: ${divide_ceil:${evaluation.model.model_parallel_size}, ${.nodes}}
eval_name: eval_prompt_squad
model_train_name: gpt3_5b
tasks: "prompt" # general prompt task
prompt_learn_dir: ${base_results_dir}/${.model_train_name}/prompt_learning_squad # assume prompt learning was on squad task
results_dir: ${base_results_dir}/${.model_train_name}/${.eval_name}
To specify the model checkpoint to be loaded and which prompt learning test
dataset to evaluate, set the model
configuration:
model:
model_type: nemo-gpt3-prompt
nemo_model: ${evaluation.run.prompt_learn_dir}/megatron_gpt_prompt.nemo
tensor_model_parallel_size: 2 #1 for 126m, 2 for 5b, 8 for 20b
pipeline_model_parallel_size: 1
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
precision: bf16 # must match training precision - 32, 16 or bf16
eval_batch_size: 4
prompt_dataset_paths: ${data_dir}/prompt_data/v1.1/squad_test.jsonl
disable_special_tokens: False # Whether to disable virtual tokens in prompt model evaluation. This is equivalent to evaluate without prompt-/p-tuning.
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: 1
gpus_per_node: null
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the evaluation pipeline and not the data preparation,
training, conversion, or inference pipelines, set the stages
section of conf/config.yaml
to:
stages:
- evaluation
Then enter:
python3 main.py
Base Command Platform
To run the evaluation script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
. This config can be overidden from the command line using hydra. This script must be launched in a multi-node job.
To run the evaluation pipeline to evaluate a prompt-learned 5B GPT
model checkpoint stored in /mount/results/gpt3_5b/checkpoints
, enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=prompt_gpt3/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/gpt3_5b/eval_prompt_squad \
evaluation.model.nemo_model=/mount/results/gpt3_5b/prompt_learning_squad/results/megatron_gpt_prompt.nemo \
evaluation.model.nemo_model=4 evaluation.model.tensor_model_parallel_size=2 \
>> /results/eval_prompt_gpt3_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/eval_prompt_gpt3_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
NVIDIA provides a simple tool to help evaluate prompt-learned T5 and mT5 checkpoints. You can evaluate the capabilities of prompt-learned models on a customized prompt learning test dataset.
NVIDIA provides an example which evaluates a checkpoint that went through prompt learning on SQuAD v1.1, on the SQuAD v1.1 test dataset created in prompt learning step.
Set the evaluation
configuration in conf/config.yaml
, which specifies the pathname of the evaluation configuration file.
For T5 models, set evaluation
to prompt_t5/squad.yaml
, which specifies the evaluation configuration file as conf/evaluation/prompt_t5/squad.yaml
.
For mT5 models, set it to prompt_mt5/squad.yaml
, which specifies the file as conf/evaluation/prompt_mt5/squad.yaml
.
The evaluation
value must be included in stages
to run the evaluation pipeline.
The configurations can be modified to adapt to different evaluation tasks and checkpoints in evaluation runs. For Base Command Platform, all configurations must be overriden from the command line.
Common
Set the run.tasks
configuration to prompt
. Set the other run
configuration to define the job-specific configuration:
run:
name: eval_${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
model_train_name: t5_220m # or mt5_390m
task_name: "squad"
prompt_learning_dir: ${base_results_dir}/${.model_train_name}/prompt_learning_squad # assume prompt learning was on squad task
results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_eval
To specify the model checkpoint to be loaded and the prompt learning test dataset to be evaluated, set the following configurations:
data:
test_ds:
- ${data_dir}/prompt_data/v1.1/squad_test.jsonl
num_workers: 4
global_batch_size: 16
micro_batch_size: 16
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
pipeline_model_parallel_split_rank: ${divide_floor:${.pipeline_model_parallel_size}, 2}
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
language_model_path: ${base_results_dir}/${evaluation.run.model_train_name}/convert_nemo/results/megatron_t5.nemo # or megatron_mt5.nemo
virtual_prompt_model_file: ${evaluation.run.prompt_learning_dir}/results/megatron_t5_prompt.nemo # or megatron_mt5_prompt.nemo
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: 1
gpus_per_node: null
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the evaluation pipeline and not the data preparation,
training, conversion, or inference pipelines, set the stages
section of conf/config.yaml
to:
stages:
- evaluation
Then enter:
python3 main.py
Base Command Platform
To run the evaluation script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
. This config can be overidden from the command line using hydra. This script must be launched in a multi-node job.
To run the evaluation pipeline to evaluate a prompt-learned 220M T5
model checkpoint stored in
/mount/results/t5_220m/prompt_learning_squad
, enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=prompt_t5/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/t5_220m/eval_prompt_squad \
evaluation.model.virtual_prompt_model_file=/mount/results/t5_220m/prompt_learning_squad/results/megatron_t5_prompt.nemo \
>> /results/eval_prompt_t5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/eval_prompt_t5_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
To run the evaluation pipeline to evaluate a prompt-learned 390M mT5
model checkpoint stored in
/mount/results/mt5_390m/prompt_learning_squad
, enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=prompt_mt5/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/mt5_390m/eval_prompt_squad \
evaluation.model.virtual_prompt_model_file=/mount/results/mt5_390m/prompt_learning_squad/results/megatron_mt5_prompt.nemo \
>> /results/eval_prompt_mt5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/eval_prompt_mt5_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
NVIDIA provides a simple tool to help evaluate the adapter and IA3 learned GPT checkpoints. You can evaluate the capabilities of the adapter-learned GPT model on a customized adapter learning test dataset.
NVIDIA provides an example to evaluate a checkpoint which went through adapter learning or IA3 learning on SQuAD v1.1.
Set the evaluation
configuration in conf/config.yaml
, which specifies the pathname of the evaluation configuration file.
For adapter learning set the evaluation
configuration to adapter_gpt3/squad.yml
,
which specifies the evaluation configuration file as conf/evaluation/adapter_gpt3/squad.yaml
.
For IA3 learning set the configuration to ia3_gpt3/squad.yml
,
which specifies the evaluation configuration file as conf/evaluation/ia3_gpt3/squad.yaml
.
The evaluation
configuration must be included in stages
to run the evaluation pipeline.
The configurations can be modified to adapt to different evaluation tasks and checkpoints in evaluation runs. For Base Command Platform, all configurations must be overriden from the command line.
Common
To run evaluation on adapter learning test tasks, set the run.tasks
configuration to adapter
.
Set the other run
configurations to define the job-specific configuration:
run:
name: ${.eval_name}_${.model_train_name}
time_limit: "4:00:00"
nodes: ${divide_ceil:${evaluation.model.model_parallel_size}, 8} # 8 gpus per node
ntasks_per_node: ${divide_ceil:${evaluation.model.model_parallel_size}, ${.nodes}}
eval_name: eval_adapter_squad # or eval_ia3_squad
model_train_name: gpt3_5b
tasks: "adapter" # general adapter task
adapter_learn_dir: ${base_results_dir}/${.model_train_name}/adapter_learning_squad # or ia3_learning_squad
results_dir: ${base_results_dir}/${.model_train_name}/${.eval_name}
To specify the model checkpoint to be loaded and the adapter learning
test dataset to be evaluated, set the model
configurations:
data:
test_ds:
- ${data_dir}/prompt_data/v1.1/squad_test.jsonl
num_workers: 4
global_batch_size: 16
micro_batch_size: 16
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
pipeline_model_parallel_split_rank: ${divide_floor:${.pipeline_model_parallel_size}, 2}
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
language_model_path: ${base_results_dir}/${evaluation.run.model_train_name}/convert_nemo/results/megatron_gpt.nemo
adapter_model_file: ${evaluation.run.adapter_learning_dir}/results/megatron_gpt_adapter.nemo # or megatron_gpt_ia3.nemo
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: 1
gpus_per_node: null
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the evaluation pipeline and not the data preparation,
training, conversion, or inference pipelines. set the conf/config.yaml
file to:
stages:
- evaluation
Then enter:
python3 main.py
Base Command Platform
To run the evaluation pipeline on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
.
This configuration can be overridden from the command line using hydra. This script must be launched in a multi-node job.
To run the evaluation pipeline to evaluate an adapter-learned 220M T5
model checkpoint stored in
/mount/results/gpt3_5b/adapter_learning_squad
, enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=adapter_gpt3/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/gpt3_5b/eval_adapter_squad \
evaluation.model.adapter_model_file=/mount/results/gpt3_5b/adapter_learning_squad/results/megatron_gpt3_adapter.nemo \
>> /results/eval_adapter_gpt3_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/eval_adapter_gpt3_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
To run the evaluation pipeline to evaluate an IA3-learned 220M T5 model
checkpoint stored in /mount/results/gpt3_5b/ia3_learning_squad
, enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=ia3_gpt3/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/gpt3_5b/eval_ia3_squad \
evaluation.model.adapter_model_file=/mount/results/gpt3_5b/ia3_learning_squad/results/megatron_t5_ia3.nemo \
>> /results/eval_ia3_t5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/eval_ia3_t5_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
Set the evaluation
configuration in conf/config.yaml
, which specifies the pathname of the evaluation configuration file.
For an adapter-learned T5 model, set the evaluation
configuration to adapter_t5/squad.yml
,
which specifies the evaluation configuration file as conf/evaluation/adapter_t5/squad.yaml
.
For an IA3-learned model, set the configuration to ia3_t5/squad.yml
,
which specifies the evaluation configuration file as conf/evaluation/ia3_t5/squad.yaml
.
The evaluation
configuration must be included in stages
to run the evaluation pipeline.
The configurations can be modified to adapt to different evaluation tasks and checkpoints in evaluation runs. For Base Command Platform, all configurations must be overriden from the command line.
Common
To specify the configuration, set the run
configurations to define
the job-specific configuration:
run:
name: eval_${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
model_train_name: t5_220m
task_name: "squad"
adapter_learning_dir: ${base_results_dir}/${.model_train_name}/adapter_learning_squad # or ia3_learning_squad
results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_eval
To specify the model checkpoint to be loaded and the test dataset to be evaluated, set the following configurations:
data:
test_ds:
- ${data_dir}/prompt_data/v1.1/squad_test.jsonl
num_workers: 4
global_batch_size: 16
micro_batch_size: 16
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
pipeline_model_parallel_split_rank: ${divide_floor:${.pipeline_model_parallel_size}, 2}
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
language_model_path: ${base_results_dir}/${evaluation.run.model_train_name}/convert_nemo/results/megatron_t5.nemo
adapter_model_file: ${evaluation.run.adapter_learning_dir}/results/megatron_t5_adapter.nemo # or megatron_t5_ia3.nemo
Slurm
Set the configuration for a Slurm cluster in conf/cluster/bcm.yaml
:
partition: null
account: null
exclusive: True
gpus_per_task: 1
gpus_per_node: null
mem: 0
overcommit: False
job_name_prefix: "nemo-megatron-"
Example
To run only the evaluation pipeline and not the data preparation,
training, conversion, or inference pipelines, set the stages
section of conf/config.yaml
to:
stages:
- evaluation
Then enter:
python3 main.py
Base Command Platform
To run the evaluation script on Base Command Platform, set the cluster_type
configuration in conf/config.yaml
to bcp
. This config can be overidden from the command line using hydra. This script must be launched in a multi-node job.
To run the evaluation pipeline to evaluate an adapter learned 220M T5
model checkpoint stored in
/mount/results/t5_220m/adapter_learning_squad
, enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=adapter_t5/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/t5_220m/eval_adapter_squad \
evaluation.model.adapter_model_file=/mount/results/t5_220m/adapter_learning_squad/results/megatron_t5_adapter.nemo \
>> /results/eval_adapter_t5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/eval_adapter_t5_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.
To run the evaluation pipeline to evaluate an IA3-learned 220M T5 model
checkpoint stored in /mount/results/t5_220m/ia3_learning_squad
, enter:
python3 /opt/NeMo-Megatron-Launcher/launcher_scripts/main.py stages=<evaluation> evaluation=ia3_t5/squad \
cluster_type=bcp launcher_scripts_path=/opt/NeMo-Megatron-Launcher/launcher_scripts data_dir=/mount/data \
base_results_dir=/mount/results evaluation.run.results_dir=/mount/results/t5_220m/eval_ia3_squad \
evaluation.model.adapter_model_file=/mount/results/t5_220m/ia3_learning_squad/results/megatron_t5_ia3.nemo \
>> /results/eval_ia3_t5_log.txt 2>&1
The command above assumes that you mounted the data workspace in /mount/data
, and the results workspace in /mount/results
. stdout
and stderr
are redirected to the file /results/eval_ia3_t5_log.txt
, which you can download from NGC. Any other required configuration may be added to modify the command’s behavior.