NeMo Evaluator Launcher CLI Reference (nv-eval)#
The NeMo Evaluator Launcher provides a command-line interface for running evaluations, managing jobs, and exporting results. The CLI is available through two commands:
nv-eval
(short alias, recommended)nemo-evaluator-launcher
(full command name)
Global Options#
nv-eval --help # Show help
nv-eval --version # Show version information
Commands Overview#
Command |
Description |
---|---|
|
Run evaluations with specified configuration |
|
Check status of jobs or invocations |
|
Kill a job or invocation |
|
List tasks or runs |
|
Export evaluation results to various destinations |
|
Show version information |
run - Run Evaluations#
Execute evaluations using Hydra configuration management.
Basic Usage#
# Using example configurations
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct
# With output directory override
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct \
-o execution.output_dir=/path/to/results
Configuration Options#
# Using custom config directory
nv-eval run --config-dir my_configs --config-name my_evaluation
# Multiple overrides (Hydra syntax)
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct \
-o execution.output_dir=results \
-o target.api_endpoint.model_id=my-model \
-o +config.params.limit_samples=10
Dry Run#
Preview the full resolved configuration without executing:
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct --dry-run
Test Runs#
Run with limited samples for testing:
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct \
-o +config.params.limit_samples=10
Examples by Executor#
Local Execution:
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct \
-o execution.output_dir=./local_results
Slurm Execution:
nv-eval run --config-dir examples --config-name slurm_llama_3_1_8b_instruct \
-o execution.output_dir=/shared/results
Lepton AI Execution:
# With model deployment
nv-eval run --config-dir examples --config-name lepton_nim_llama_3_1_8b_instruct
# Using existing endpoint
nv-eval run --config-dir examples --config-name lepton_none_llama_3_1_8b_instruct
status - Check Job Status#
Check the status of running or completed evaluations.
Status Basic Usage#
# Check status of specific invocation (returns all jobs in that invocation)
nv-eval status abc12345
# Check status of specific job
nv-eval status abc12345.0
# Output as JSON
nv-eval status abc12345 --json
Output Formats#
Table Format (default):
Job ID | Status | Executor Info | Location
abc12345.0 | running | container123 | <output_dir>/task1/...
abc12345.1 | success | container124 | <output_dir>/task2/...
JSON Format (with –json flag):
[
{
"invocation": "abc12345",
"job_id": "abc12345.0",
"status": "running",
"data": {
"container": "eval-container",
"output_dir": "/path/to/results"
}
},
{
"invocation": "abc12345",
"job_id": "abc12345.1",
"status": "success",
"data": {
"container": "eval-container",
"output_dir": "/path/to/results"
}
}
]
kill - Kill Jobs#
Stop running evaluations.
Kill Basic Usage#
# Kill entire invocation
nv-eval kill abc12345
# Kill specific job
nv-eval kill abc12345.0
The command outputs JSON with the results of the kill operation.
ls - List Resources#
List available tasks or runs.
List Tasks#
# List all available evaluation tasks
nv-eval ls tasks
# List tasks with JSON output
nv-eval ls tasks --json
Output Format:
Tasks display grouped by harness and container, showing the task name and required endpoint type:
===================================================
harness: lm_eval
container: nvcr.io/nvidia/nemo:24.01
task endpoint_type
---------------------------------------------------
arc_challenge chat
hellaswag completions
winogrande completions
---------------------------------------------------
3 tasks available
===================================================
List Runs#
# List recent evaluation runs
nv-eval ls runs
# Limit number of results
nv-eval ls runs --limit 10
# Filter by executor
nv-eval ls runs --executor local
# Filter by date
nv-eval ls runs --since "2024-01-01"
nv-eval ls runs --since "2024-01-01T12:00:00"
Output Format:
invocation_id earliest_job_ts num_jobs executor benchmarks
abc12345 2024-01-01T10:00:00 3 local ifeval,gpqa_diamond,mbpp
def67890 2024-01-02T14:30:00 2 slurm hellaswag,winogrande
export - Export Results#
Export evaluation results to various destinations.
Export Basic Usage#
# Export to local files (JSON format)
nv-eval export abc12345 --dest local --format json
# Export to specific directory
nv-eval export abc12345 --dest local --format json --output-dir ./results
# Specify custom filename
nv-eval export abc12345 --dest local --format json --output-filename results.json
Export Options#
# Available destinations
nv-eval export abc12345 --dest local # Local file system
nv-eval export abc12345 --dest mlflow # MLflow tracking
nv-eval export abc12345 --dest wandb # Weights & Biases
nv-eval export abc12345 --dest gsheets # Google Sheets
nv-eval export abc12345 --dest jet # JET (internal)
# Format options (for local destination only)
nv-eval export abc12345 --dest local --format json
nv-eval export abc12345 --dest local --format csv
# Include logs when exporting
nv-eval export abc12345 --dest local --format json --copy-logs
# Filter metrics by name
nv-eval export abc12345 --dest local --format json --log-metrics score --log-metrics accuracy
# Copy all artifacts (not just required ones)
nv-eval export abc12345 --dest local --only-required False
Exporting Multiple Invocations#
# Export several runs together
nv-eval export abc12345 def67890 ghi11111 --dest local --format json
# Export several runs with custom output
nv-eval export abc12345 def67890 --dest local --format csv \
--output-dir ./all-results --output-filename combined.csv
Cloud Exporters#
For cloud destinations like MLflow, W&B, and Google Sheets, configure credentials through environment variables or their respective configuration files before using the export command. Refer to each exporter’s documentation for setup instructions.
version - Version Information#
Display version and build information.
# Show version
nv-eval version
# Alternative
nv-eval --version
Environment Variables#
The CLI respects environment variables for logging and task-specific authentication:
Variable |
Description |
Default |
---|---|---|
|
Logging level for the launcher (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
|
|
Alternative log level variable |
Uses |
|
Disable credential redaction in logs (set to 1, true, or yes) |
Not set |
Task-Specific Environment Variables#
Some evaluation tasks require API keys or tokens. These are configured in your evaluation YAML file under env_vars
and must be set before running:
# Set task-specific environment variables
export HF_TOKEN="hf_..." # For Hugging Face datasets
export API_KEY="nvapi-..." # For NVIDIA API endpoints
# Run evaluation
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct
The specific environment variables required depend on the tasks and endpoints you’re using. Refer to the example configuration files for details on which variables are needed.
Configuration File Examples#
The NeMo Evaluator Launcher includes several example configuration files that demonstrate different use cases. These files are located in the examples/
directory of the package:
local_llama_3_1_8b_instruct.yaml
- Local execution with an existing endpointlocal_limit_samples.yaml
- Local execution with limited samples for testingslurm_llama_3_1_8b_instruct.yaml
- Slurm execution with model deploymentslurm_no_deployment_llama_3_1_8b_instruct.yaml
- Slurm execution with existing endpointlepton_nim_llama_3_1_8b_instruct.yaml
- Lepton AI execution with NIM deploymentlepton_vllm_llama_3_1_8b_instruct.yaml
- Lepton AI execution with vLLM deploymentlepton_none_llama_3_1_8b_instruct.yaml
- Lepton AI execution with existing endpoint
To use these examples:
# Copy an example to your local directory
cp examples/local_llama_3_1_8b_instruct.yaml my_config.yaml
# Edit the configuration as needed
# Then run with your config
nv-eval run --config-dir . --config-name my_config
Refer to the configuration documentation for detailed information on all available configuration options.
Troubleshooting#
Configuration Issues#
Configuration Errors:
# Validate configuration without running
nv-eval run --config-dir examples --config-name my_config --dry-run
Permission Errors:
# Check file permissions
ls -la examples/my_config.yaml
# Use absolute paths
nv-eval run --config-dir /absolute/path/to/configs --config-name my_config
Network Issues:
# Test endpoint connectivity
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "test", "messages": [{"role": "user", "content": "Hello"}]}'
Debug Mode#
# Set log level to DEBUG for detailed output
export NEMO_EVALUATOR_LOG_LEVEL=DEBUG
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct
# Or use single-letter shorthand
export NEMO_EVALUATOR_LOG_LEVEL=D
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct
# Logs are written to ~/.nemo-evaluator/logs/
Getting Help#
# Command-specific help
nv-eval run --help
nv-eval export --help
nv-eval ls --help
# General help
nv-eval --help
See Also#
Python API - Programmatic interface
NeMo Evaluator Launcher Quickstart - Getting started guide
Executors - Execution backends
Exporters - Export destinations