Model Evaluation
NVIDIA provides a simple tool to help evaluate trained checkpoints. You can evaluate the capabilities of the StarCoder2 models on the following task:
human_eval
Run Evaluation
To run evaluation update conf/config.yaml
:
defaults:
- evaluation: starcoder2/human_eval.yaml
stages:
- evaluation
Execute launcher pipeline: python3 main.py
Configuration
Default configurations for evaluation can be found in conf/evaluation/starcoder2/human_eval.yaml
run:
name: eval_${.task_name}_${.model_train_name}
time_limit: "04:00:00"
dependency: "singleton"
ntasks_per_node: 1
convert_name: convert_nemo
model_train_name: starcoder2
task_name: "human_eval" # HumanEval
convert_dir: ${base_results_dir}/${.model_train_name}/${.convert_name}
fine_tuning_dir: ${base_results_dir}/${.model_train_name}/${.task_name}
results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_evaluation
tasks
sets the evaluation task to execute. Currently only HumanEval is supported.
model:
model_type: nemo-StarCoder2
nemo_model: null # specify path to .nemo file, produced when converted interleaved checkpoints
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}}
precision: bf16 # must match training precision - 32, 16 or bf16
eval_batch_size: 4
nemo_model
sets the path to .nemo
checkpoint to run evaluation.