Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Model Evaluation
NVIDIA provides a simple tool to help evaluate trained checkpoints.
You can evaluate the capabilities of the Nemotron model on the following
ZeroShot downstream evaluation tasks: lambada
, boolq
, race
, piqa
, hellaswag
, winogrande
, wikitext2
, wikitext103
You can also evaluate fine-tuned Nemotron models on squad
tasks.
Run Evaluation
To run evaluation, update
conf/config.yaml
:
defaults:
- evaluation: nemotron/evaluate_all.yaml
stages:
- evaluation
Execute the launcher pipeline:
python3 main.py
.
Configuration Evaluation
You can find default configurations for evaluation in conf/evaluation/nemotron/evaluate_all.yaml
.
To configure evaluation, run the following:
run:
name: ${.eval_name}_${.model_train_name}
time_limit: "4:00:00"
nodes: ${divide_ceil:${evaluation.model.model_parallel_size}, 8} # 8 gpus per node
ntasks_per_node: ${divide_ceil:${evaluation.model.model_parallel_size}, ${.nodes}}
eval_name: eval_all
model_train_name: nemotron
train_dir: ${base_results_dir}/${.model_train_name}
tasks: all_tasks
results_dir: ${base_results_dir}/${.model_train_name}/${.eval_name}
The tasks
parameter sets the evaluation task to execute.. Supported tasks include: lambada, boolq, race, piqa, hellaswag, winogrande, wikitext2, wikitext103, all_tasks. all_tasks
executes all supported evaluation tasks.
Set the appropriate model parallel sizes. For nemotron 340B, use the following values:
model:
model_type: nemo-nemotron
nemo_model: null # specify path to .nemo file, produced when converted interleaved checkpoints
tensor_model_parallel_size: 8
pipeline_model_parallel_size: 2
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
precision: bf16 # must match training precision - 32, 16 or bf16
eval_batch_size: 4
The nemo_model
parameter sets the path to .nemo
checkpoint to run evaluation.
Run Evaluation on PEFT Nemotron Models
To run evaluation on PEFT Nemotron models, update
conf/config.yaml
:
defaults:
- evaluation: peft_nemotron/squad.yaml
stages:
- evaluation
Execute the launcher pipeline:
python3 main.py
.
Configuration Evaluation
You can find default configurations for PEFT Nemotron evaluation in conf/evaluation/peft_nemotron/squad.yaml
To configure evaluation, run the following:
run:
name: eval_${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
convert_name: convert_nemo
model_train_name: nemotron
task_name: "squad" # SQuAD v1.1
convert_dir: ${base_results_dir}/${.model_train_name}/${.convert_name}
fine_tuning_dir: ${base_results_dir}/${.model_train_name}/peft_${.task_name}
results_dir: ${base_results_dir}/${.model_train_name}/peft_${.task_name}_eval
Set PEFT-specific configurations:
peft:
peft_scheme: "ptuning" # can be either adapter,ia3, or ptuning
restore_from_path: ${evaluation.run.fine_tuning_dir}/${.peft_scheme}/megatron_nemotron_peft_tuning-${.peft_scheme}/checkpoints/megatron_nemotron_peft_tuning-{.peft_scheme}.nemo
The peft_scheme
parameter sets the scheme used during fine-tuning.
The restore_from_path
parameter specifies the path to the PEFT checkpoint for evaluation.