Configuration#
The nemo-evaluator-launcher uses Hydra for configuration management, enabling flexible composition and command-line overrides.
How it Works#
Choose your deployment: Start with
deployment: none
to use existing endpointsSet your execution platform: Use
execution: local
for developmentConfigure your target: Point to your API endpoint
Select benchmarks: Add evaluation tasks
Test first: Always use
--dry-run
to verify
# Verify configuration
nemo-evaluator-launcher run --config-name your_config --dry-run
# Run evaluation
nemo-evaluator-launcher run --config-name your_config
Basic Structure#
Every configuration has four main sections:
defaults:
- execution: local # Where to run: local, lepton, slurm
- deployment: none # How to deploy: none, vllm, sglang, nim, trtllm, generic
- _self_
execution:
output_dir: results # Required: where to save results
target: # Required for deployment: none
api_endpoint:
model_id: meta/llama-3.1-8b-instruct
url: https://integrate.api.nvidia.com/v1/chat/completions
api_key_name: NGC_API_KEY
evaluation: # Required: what benchmarks to run
- name: gpqa_diamond
- name: ifeval
Deployment Options#
Choose how to serve your model for evaluation:
Use existing API endpoints like NVIDIA API Catalog, OpenAI, or custom deployments. No model deployment needed.
High-performance LLM serving with advanced parallelism strategies. Best for production workloads and large models.
Fast serving framework optimized for structured generation and high-throughput inference with efficient memory usage.
NVIDIA-optimized inference microservices with automatic scaling, optimization, and enterprise-grade features.
NVIDIA TensorRT LLM.
Deploy models using a fully custom setup.
Execution Platforms#
Choose where to run your evaluations:
Docker-based evaluation on your local machine. Perfect for development, testing, and small-scale evaluations.
Cloud execution with on-demand GPU provisioning. Ideal for production evaluations and scalable workloads.
HPC cluster execution with resource management. Best for large-scale evaluations and batch processing.
Evaluation Configuration#
Configure evaluation tasks, parameter overrides, and environment variables for your benchmarks.
Command Line Overrides#
Override any configuration value using the -o
flag:
# Basic override
nemo-evaluator-launcher run --config-name your_config \
-o execution.output_dir=my_results
# Multiple overrides
nemo-evaluator-launcher run --config-name your_config \
-o execution.output_dir=my_results \
-o target.api_endpoint.url="https://new-endpoint.com/v1/chat/completions"