NeMo Evaluator Launcher#
Best for: Most users who want a unified CLI experience
The NeMo Evaluator Launcher provides the simplest way to run evaluations with automated container management, built-in orchestration, and comprehensive result export capabilities.
Prerequisites#
OpenAI-compatible endpoint (hosted or self-deployed)
Docker installed (for local execution)
Quick Start#
# 1. Install the launcher
pip install nemo-evaluator-launcher
# 2. List available benchmarks
nv-eval ls tasks
# 3. Run evaluation against a hosted endpoint
# Run evaluation against a hosted endpoint
nv-eval run \
--config-dir examples \
--config-name local_llama_3_1_8b_instruct \
-o target.api_endpoint.url=https://integrate.api.nvidia.com/v1/chat/completions \
-o target.api_endpoint.api_key_name=NGC_API_KEY \
-o execution.output_dir=./results
# 4. Check status and results
nv-eval status <invocation_id>
Complete Working Example#
Here’s a complete example using NVIDIA Build (build.nvidia.com):
# Run a quick test evaluation with limited samples
nv-eval run \
--config-dir examples \
--config-name local_llama_3_1_8b_instruct \
-o target.api_endpoint.url=https://integrate.api.nvidia.com/v1/chat/completions \
-o target.api_endpoint.model_id=meta/llama-3.1-8b-instruct \
-o target.api_endpoint.api_key_name=NGC_API_KEY \
-o execution.output_dir=./results \
-o config.params.limit_samples=10
What happens:
Pulls appropriate evaluation container
Runs benchmark against your endpoint
Saves results to specified directory
Provides monitoring and status updates
Key Features#
Automated Container Management#
Automatically pulls and manages evaluation containers
Handles volume mounting for results
No manual Docker commands required
Built-in Orchestration#
Job queuing and parallel execution
Progress monitoring and status tracking
Result Export#
Export to MLflow, Weights & Biases, or local formats
Structured result formatting
Integration with experiment tracking platforms
Configuration Management#
YAML-based configuration system
Override parameters via command line
Template configurations for common scenarios
Next Steps#
Explore different evaluation types:
nv-eval ls tasks
Try advanced configurations in the
examples/
directoryExport results to your preferred tracking platform
Scale to cluster execution with Slurm or cloud providers
For more advanced control, consider the NeMo Evaluator Core Python API or Container Direct approaches.