Model Evaluation

User Guide (Latest Version)

NVIDIA provides a simple tool to help evaluate trained checkpoints. You can evaluate the capabilities of the Baichuan2 model on the following ZeroShot downstream evaluation tasks:

  • lambada, boolq, race, piqa, hellaswag, winogrande, wikitext2, wikitext103

Fine-tuned Baichuan2 models can be evaluated on the following tasks:

  • squad

To run evaluation update conf/config.yaml:

Copy
Copied!
            

defaults: - evaluation: baichuan2/evaluate_all.yaml stages: - evaluation

Execute launcher pipeline: python3 main.py

Configuration

Default configurations for evaluation can be found in conf/evaluation/baichuan2/evaluate_all.yaml

Copy
Copied!
            

run: name: ${.eval_name}_${.model_train_name} time_limit: "4:00:00" nodes: ${divide_ceil:${evaluation.model.model_parallel_size}, 8} # 8 gpus per node ntasks_per_node: ${divide_ceil:${evaluation.model.model_parallel_size}, ${.nodes}} eval_name: eval_all model_train_name: baichuan2_7b train_dir: ${base_results_dir}/${.model_train_name} tasks: all_tasks results_dir: ${base_results_dir}/${.model_train_name}/${.eval_name}

tasks sets the evaluation task to execute. Supported tasks include: lambada, boolq, race, piqa, hellaswag, winogrande, wikitext2, wikitext103, all_tasks. all_tasks executes all supported evaluation tasks.

Copy
Copied!
            

model: model_type: nemo-baichuan2 nemo_model: null # specify path to .nemo file, produced when converted interleaved checkpoints tensor_model_parallel_size: 1 pipeline_model_parallel_size: 1 model_parallel_size: ${multiply:${.tensor_model_parallel_size}, ${.pipeline_model_parallel_size}} precision: bf16 # must match training precision - 32, 16 or bf16 eval_batch_size: 4

nemo_model sets the path to .nemo checkpoint to run evaluation.

To run evaluation on PEFT Baichuan2 models update conf/config.yaml:

Copy
Copied!
            

defaults: - evaluation: peft_baichuan2/squad.yaml stages: - evaluation

Execute launcher pipeline: python3 main.py

Configuration

Default configurations for PEFT Baichuan2 evaluation can be found in conf/evaluation/peft_baichuan2/squad.yaml

Copy
Copied!
            

run: name: eval_${.task_name}_${.model_train_name} time_limit: "04:00:00" dependency: "singleton" convert_name: convert_nemo model_train_name: baichuan2_7b task_name: "squad" # SQuAD v1.1 convert_dir: ${base_results_dir}/${.model_train_name}/${.convert_name} fine_tuning_dir: ${base_results_dir}/${.model_train_name}/peft_${.task_name} results_dir: ${base_results_dir}/${.model_train_name}/peft_${.task_name}_eval

Set PEFT specific configurations:

Copy
Copied!
            

peft: peft_scheme: "ptuning" # can be either adapter,ia3, or ptuning restore_from_path: ${evaluation.run.fine_tuning_dir}/${.peft_scheme}/megatron_baichuan2_peft_tuning-${.peft_scheme}/checkpoints/megatron_baichuan2_peft_tuning-{.peft_scheme}.nemo

peft_scheme sets the scheme used during fine-tuning.

restore_from_path sets the path to PEFT checkpoint to run evaluation on.

Previous Checkpoint Conversion
Next Parameter Efficient Fine-Tuning (PEFT)
© | | | | | | |. Last updated on May 30, 2024.