NeMo Evaluator CLI Reference (nemo-evaluator)#
This document provides a comprehensive reference for the nemo-evaluator
command-line interface, which is the primary way to interact with NeMo Evaluator from the terminal.
Prerequisites#
Container way: Use evaluation containers mentioned in NeMo Evaluator Containers
Package way:
pip install nemo-evaluator
To run evaluations, you also need to install an evaluation framework package (for example,
nvidia-simple-evals
):pip install nvidia-simple-evals
Overview#
The CLI provides a unified interface for managing evaluations and frameworks. It’s built on top of the Python API and provides full feature parity with it.
Command Structure#
nemo-evaluator [command] [options]
Available Commands#
ls
- List Available Evaluations#
List all available evaluation types and frameworks.
nemo-evaluator ls
Output Example:
nvidia-simple-evals:
* mmlu_pro
...
human_eval:
* human_eval
run_eval
- Run Evaluation#
Execute an evaluation with the specified configuration.
nemo-evaluator run_eval [options]
To see the list of options, run:
nemo-evaluator run_eval --help
Required Options:
--eval_type
: Type of evaluation to run--model_id
: Model identifier--model_url
: API endpoint URL--model_type
: Endpoint type (chat, completions, vlm, embedding)--output_dir
: Output directory for results
Optional Options:
--api_key_name
: Environment variable name for API key--run_config
: Path to YAML configuration file--overrides
: Comma-separated parameter overrides--dry_run
: Show configuration without running--debug
: Enable debug mode (deprecated, use NEMO_EVALUATOR_LOG_LEVEL)
Example Usage:
# Basic evaluation
nemo-evaluator run_eval \
--eval_type mmlu_pro \
--model_id "meta/llama-3.1-8b-instruct" \
--model_url "https://integrate.api.nvidia.com/v1/chat/completions" \
--model_type chat \
--api_key_name MY_API_KEY \
--output_dir ./results
# With parameter overrides
nemo-evaluator run_eval \
--eval_type mmlu_pro \
--model_id "meta/llama-3.1-8b-instruct" \
--model_url "https://integrate.api.nvidia.com/v1/chat/completions" \
--model_type chat \
--api_key_name MY_API_KEY \
--output_dir ./results \
--overrides "config.params.limit_samples=100,config.params.temperature=0.1"
# Dry run to see configuration
nemo-evaluator run_eval \
--eval_type mmlu_pro \
--model_id "meta/llama-3.1-8b-instruct" \
--model_url "https://integrate.api.nvidia.com/v1/chat/completions" \
--model_type chat \
--api_key_name MY_API_KEY \
--output_dir ./results \
--dry_run
For execution with run configuration:
# Using YAML configuration file
nemo-evaluator run_eval \
--eval_type mmlu_pro \
--output_dir ./results \
--run_config ./config/eval_config.yml
To check the structure of the run configuration, refer to the Run Configuration section below.
Run Configuration#
Run configurations are stored in YAML files with the following structure:
config:
type: mmlu_pro
params:
limit_samples: 10
target:
api_endpoint:
url: https://integrate.api.nvidia.com/v1/chat/completions
model_id: meta/llama-3.1-8b-instruct
type: chat
api_key: MY_API_KEY
adapter_config:
interceptors:
- name: "request_logging"
- name: "caching"
enabled: true
config:
cache_dir: "./cache"
- name: "endpoint"
- name: "response_logging"
enabled: true
config:
output_dir: "./cache/responses"
Run configurations can be specified in YAML files and executed with following syntax:
nemo-evaluator run_eval \
--run_config config.yml \
--output_dir `mktemp -d`
Parameter Overrides#
Parameter overrides use a dot-notation format to specify configuration paths:
# Basic parameter overrides
--overrides "config.params.limit_samples=100,config.params.temperature=0.1"
# Adapter configuration overrides
--overrides "target.api_endpoint.adapter_config.interceptors.0.config.output_dir=./logs"
# Multiple complex overrides
--overrides "config.params.limit_samples=100,config.params.max_tokens=512,target.api_endpoint.adapter_config.use_caching=true"
Override Format#
section.subsection.parameter=value
Examples:
config.params.limit_samples=100
target.api_endpoint.adapter_config.use_caching=true
Handle Errors#
Debug Mode#
Enable debug mode for detailed error information:
# Set environment variable (recommended)
export NEMO_EVALUATOR_LOG_LEVEL=DEBUG
# Or use deprecated debug flag
nemo-evaluator run_eval --debug [options]
Examples#
Complete Evaluation Workflow#
# 1. List available evaluations
nemo-evaluator ls
# 2. Run evaluation
nemo-evaluator run_eval \
--eval_type mmlu_pro \
--model_id "meta/llama-3.1-8b-instruct" \
--model_url "https://integrate.api.nvidia.com/v1/chat/completions" \
--model_type chat \
--api_key_name MY_API_KEY \
--output_dir ./results \
--overrides "config.params.limit_samples=100"
# 3. Show results
ls -la ./results/
Batch Evaluation Script#
#!/bin/bash
# Batch evaluation script
models=("meta/llama-3.1-8b-instruct" "meta/llama-3.1-70b-instruct")
eval_types=("mmlu_pro" "gsm8k")
for model in "${models[@]}"; do
for eval_type in "${eval_types[@]}"; do
echo "Running $eval_type on $model..."
nemo-evaluator run_eval \
--eval_type "$eval_type" \
--model_id "$model" \
--model_url "https://integrate.api.nvidia.com/v1/chat/completions" \
--model_type chat \
--api_key_name MY_API_KEY \
--output_dir "./results/${model//\//_}_${eval_type}" \
--overrides "config.params.limit_samples=50"
echo "Completed $eval_type on $model"
done
done
echo "All evaluations completed!"
Framework Development#
# Setup new framework
nemo-evaluator-example my_custom_eval .
# This creates the basic structure:
# core_evals/my_custom_eval/
# ├── framework.yml
# ├── output.py
# └── __init__.py
# Edit framework.yml to configure your evaluation
# Edit output.py to implement result parsing
# Test your framework
nemo-evaluator run_eval \
--eval_type my_custom_eval \
--model_id "test-model" \
--model_url "https://api.example.com/v1/chat/completions" \
--model_type chat \
--api_key_name MY_API_KEY \
--output_dir ./results
Framework Setup Command#
nemo-evaluator-example
- Setup Framework#
Set up NVIDIA framework files in a destination folder.
nemo-evaluator-example [package_name] [destination]
Arguments:
package_name
: Python package-like name for the frameworkdestination
: Destination folder where to create framework files
Example Usage:
# Setup framework in specific directory
nemo-evaluator-example my_package /path/to/destination
# Setup framework in current directory
nemo-evaluator-example my_package .
What it creates:
core_evals/my_package/framework.yml
- Framework configurationcore_evals/my_package/output.py
- Output parsing logiccore_evals/my_package/__init__.py
- Package initialization
Environment Variables#
Logging Configuration#
# Set log level (recommended over --debug flag)
export NEMO_EVALUATOR_LOG_LEVEL=DEBUG