Model Evaluation

NVIDIA provides a simple tool to help evaluate trained checkpoints. You can evaluate the capabilities of the StarCoder2 models on the following task:

  • human_eval

To run evaluation update conf/config.yaml:

Copy
Copied!
            

defaults: - evaluation: starcoder2/human_eval.yaml stages: - evaluation

Execute launcher pipeline: python3 main.py

Configuration

Default configurations for evaluation can be found in conf/evaluation/starcoder2/human_eval.yaml

Copy
Copied!
            

run: name: eval_${.task_name}_${.model_train_name} time_limit: "04:00:00" dependency: "singleton" ntasks_per_node: 1 convert_name: convert_nemo model_train_name: starcoder2 task_name: "human_eval" # HumanEval convert_dir: ${base_results_dir}/${.model_train_name}/${.convert_name} fine_tuning_dir: ${base_results_dir}/${.model_train_name}/${.task_name} results_dir: ${base_results_dir}/${.model_train_name}/${.task_name}_evaluation

tasks sets the evaluation task to execute. Currently only HumanEval is supported.

Copy
Copied!
            

model: model_type: nemo-StarCoder2 nemo_model: null # specify path to .nemo file, produced when converted interleaved checkpoints tensor_model_parallel_size: 1 pipeline_model_parallel_size: 1 model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}} precision: bf16 # must match training precision - 32, 16 or bf16 eval_batch_size: 4

nemo_model sets the path to .nemo checkpoint to run evaluation.

Previous Checkpoint Conversion
Next Supervised Fine-tuning (SFT)
© Copyright 2023-2024, NVIDIA. Last updated on May 17, 2024.