NeMo Evaluator Launcher CLI Reference (nemo-evaluator-launcher)#
The NeMo Evaluator Launcher provides a command-line interface for running evaluations, managing jobs, and exporting results. The CLI is available through nemo-evaluator-launcher command.
Global Options#
nemo-evaluator-launcher --help # Show help
nemo-evaluator-launcher --version # Show version information
Commands Overview#
Command |
Description |
|---|---|
|
Run evaluations with specified configuration |
|
Check status of jobs or invocations |
|
Show detailed job(s) information |
|
Kill a job or invocation |
|
List tasks or runs |
|
Export evaluation results to various destinations |
|
Show version information |
run - Run Evaluations#
Execute evaluations using Hydra configuration management.
Basic Usage#
# Using example configurations
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml
# With output directory override
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml \
-o execution.output_dir=/path/to/results
Configuration Options#
# Using custom config directory
nemo-evaluator-launcher run --config my_configs/my_evaluation.yaml
# Multiple overrides (Hydra syntax)
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml \
-o execution.output_dir=results \
-o target.api_endpoint.model_id=my-model \
-o +config.params.limit_samples=10
Config Loading Modes#
The --config-mode parameter controls how configuration files are loaded:
hydra(default): Uses Hydra configuration system. Hydra handles configuration composition, overrides, and validation.raw: Loads the config file directly without Hydra processing. Useful for loading pre-generated complete configuration files.
# Default: Hydra mode (config file is processed by Hydra)
nemo-evaluator-launcher run --config my_config.yaml
# Explicit Hydra mode
nemo-evaluator-launcher run --config my_config.yaml --config-mode=hydra
# Raw mode: load config file directly (bypasses Hydra)
nemo-evaluator-launcher run --config complete_config.yaml --config-mode=raw
Note: When using --config-mode=raw, the --config parameter is required, and -o/--override cannot be used.
Dry Run#
Preview the full resolved configuration without executing:
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml --dry-run
Test Runs#
Run with limited samples for testing:
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml \
-o +config.params.limit_samples=10
Task Filtering#
Run only specific tasks from your configuration using the -t flag:
# Run a single task (local_basic.yaml has ifeval, gpqa_diamond, mbpp)
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml -t ifeval
# Run multiple specific tasks
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml -t ifeval -t mbpp
# Combine with other options
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml -t ifeval -t mbpp --dry-run
Notes:
Tasks must be defined in your configuration file under
evaluation.tasksIf any requested task is not found in the configuration, the command will fail with an error listing available tasks
Task filtering preserves all task-specific overrides and
nemo_evaluator_configsettings
Examples by Executor#
Local Execution:
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml \
-o execution.output_dir=./local_results
Slurm Execution:
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/slurm_vllm_basic.yaml \
-o execution.output_dir=/shared/results
Lepton AI Execution:
# With model deployment
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/lepton_nim.yaml
# Using existing endpoint
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/lepton_basic.yaml
status - Check Job Status#
Check the status of running or completed evaluations.
Status Basic Usage#
# Check status of specific invocation (returns all jobs in that invocation)
nemo-evaluator-launcher status abc12345
# Check status of specific job
nemo-evaluator-launcher status abc12345.0
# Output as JSON
nemo-evaluator-launcher status abc12345 --json
Output Formats#
Table Format (default):
Job ID | Status | Executor Info | Location
abc12345.0 | running | container123 | <output_dir>/task1/...
abc12345.1 | success | container124 | <output_dir>/task2/...
JSON Format (with –json flag):
[
{
"invocation": "abc12345",
"job_id": "abc12345.0",
"status": "running",
"data": {
"container": "eval-container",
"output_dir": "/path/to/results"
}
},
{
"invocation": "abc12345",
"job_id": "abc12345.1",
"status": "success",
"data": {
"container": "eval-container",
"output_dir": "/path/to/results"
}
}
]
kill - Kill Jobs#
Stop running evaluations.
Kill Basic Usage#
# Kill entire invocation
nemo-evaluator-launcher kill abc12345
# Kill specific job
nemo-evaluator-launcher kill abc12345.0
The command outputs JSON with the results of the kill operation.
ls - List Resources#
List available tasks or runs.
List Tasks#
# List all available evaluation tasks
nemo-evaluator-launcher ls tasks
# List tasks with JSON output
nemo-evaluator-launcher ls tasks --json
Output Format:
Tasks display grouped by harness and container, showing the task name and required endpoint type:
===================================================
harness: lm_eval
container: nvcr.io/nvidia/nemo:24.01
task endpoint_type
---------------------------------------------------
arc_challenge chat
hellaswag completions
winogrande completions
---------------------------------------------------
3 tasks available
===================================================
List Runs#
# List recent evaluation runs
nemo-evaluator-launcher ls runs
# Limit number of results
nemo-evaluator-launcher ls runs --limit 10
# Filter by executor
nemo-evaluator-launcher ls runs --executor local
# Filter by date
nemo-evaluator-launcher ls runs --since "2024-01-01"
nemo-evaluator-launcher ls runs --since "2024-01-01T12:00:00"
# Filter by retrospecitve period
# - days
nemo-evaluator-launcher ls runs --since 2d
# - hours
nemo-evaluator-launcher ls runs --since 6h
Output Format:
invocation_id earliest_job_ts num_jobs executor benchmarks
abc12345 2024-01-01T10:00:00 3 local ifeval,gpqa_diamond,mbpp
def67890 2024-01-02T14:30:00 2 slurm hellaswag,winogrande
export - Export Results#
Export evaluation results to various destinations.
Export Basic Usage#
# Export to local files (JSON format)
nemo-evaluator-launcher export abc12345 --dest local --format json
# Export to specific directory
nemo-evaluator-launcher export abc12345 --dest local --format json --output-dir ./results
# Specify custom filename
nemo-evaluator-launcher export abc12345 --dest local --format json --output-filename results.json
Export Options#
# Available destinations
nemo-evaluator-launcher export abc12345 --dest local # Local file system
nemo-evaluator-launcher export abc12345 --dest mlflow # MLflow tracking
nemo-evaluator-launcher export abc12345 --dest wandb # Weights & Biases
nemo-evaluator-launcher export abc12345 --dest gsheets # Google Sheets
# Format options (for local destination only)
nemo-evaluator-launcher export abc12345 --dest local --format json
nemo-evaluator-launcher export abc12345 --dest local --format csv
# Include logs when exporting
nemo-evaluator-launcher export abc12345 --dest local --format json --copy-logs
# Filter metrics by name
nemo-evaluator-launcher export abc12345 --dest local --format json --log-metrics score --log-metrics accuracy
# Copy all artifacts (not just required ones)
nemo-evaluator-launcher export abc12345 --dest local --only-required False
Exporting Multiple Invocations#
# Export several runs together
nemo-evaluator-launcher export abc12345 def67890 ghi11111 --dest local --format json
# Export several runs with custom output
nemo-evaluator-launcher export abc12345 def67890 --dest local --format csv \
--output-dir ./all-results --output-filename combined.csv
Cloud Exporters#
For cloud destinations like MLflow, W&B, and Google Sheets, configure credentials through environment variables or their respective configuration files before using the export command. Refer to each exporter’s documentation for setup instructions.
version - Version Information#
Display version and build information.
# Show version
nemo-evaluator-launcher version
# Alternative
nemo-evaluator-launcher --version
Environment Variables#
The CLI respects environment variables for logging and task-specific authentication:
Variable |
Description |
Default |
|---|---|---|
|
Logging level for the launcher (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
|
|
Disable credential redaction in logs (set to 1, true, or yes) |
Not set |
Task-Specific Environment Variables#
Some evaluation tasks require API keys or tokens. These are configured in your evaluation YAML file under env_vars and must be set before running:
# Set task-specific environment variables
export HF_TOKEN="hf_..." # For Hugging Face datasets
export NGC_API_KEY="nvapi-..." # For NVIDIA API endpoints
# Run evaluation
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml
The specific environment variables required depend on the tasks and endpoints you’re using. Refer to the example configuration files for details on which variables are needed.
Configuration File Examples#
The NeMo Evaluator Launcher includes several example configuration files that demonstrate different use cases. These files are located in the examples/ directory of the package:
To use these examples:
# Copy an example to your local directory
cp examples/local_basic.yaml my_config.yaml
# Edit the configuration as needed
# Then run with your config
nemo-evaluator-launcher run --config ./my_config.yaml
Refer to the configuration documentation for detailed information on all available configuration options.
Troubleshooting#
Configuration Issues#
Configuration Errors:
# Validate configuration without running
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/my_config.yaml --dry-run
Permission Errors:
# Check file permissions
ls -la examples/my_config.yaml
# Use absolute paths
nemo-evaluator-launcher run --config /absolute/path/to/configs/my_config.yaml
Network Issues:
# Test endpoint connectivity
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "test", "messages": [{"role": "user", "content": "Hello"}]}'
Debug Mode#
# Set log level to DEBUG for detailed output
export LOG_LEVEL=DEBUG
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml
# Or use single-letter shorthand
export LOG_LEVEL=D
nemo-evaluator-launcher run --config packages/nemo-evaluator-launcher/examples/local_basic.yaml
# Logs are written to ~/.nemo-evaluator/logs/
Getting Help#
# Command-specific help
nemo-evaluator-launcher run --help
nemo-evaluator-launcher info --help
nemo-evaluator-launcher ls --help
nemo-evaluator-launcher export --help
# General help
nemo-evaluator-launcher --help
See Also#
Python API - Programmatic interface
NeMo Evaluator Launcher - Getting started guide
Executors - Execution backends
Exporters - Export destinations