Configuration#

The nemo-evaluator-launcher uses Hydra for configuration management, enabling flexible composition and command-line overrides.

How it Works#

  1. Choose your deployment: Start with deployment: none to use existing endpoints

  2. Set your execution platform: Use execution: local for development

  3. Configure your target: Point to your API endpoint

  4. Select benchmarks: Add evaluation tasks

  5. Test first: Always use --dry-run to verify

# Verify configuration
nemo-evaluator-launcher run --config-name your_config --dry-run

# Run evaluation
nemo-evaluator-launcher run --config-name your_config

Basic Structure#

Every configuration has four main sections:

defaults:
  - execution: local     # Where to run: local, lepton, slurm
  - deployment: none     # How to deploy: none, vllm, sglang, nim, trtllm, generic
  - _self_

execution:
  output_dir: results    # Required: where to save results

target:                  # Required for deployment: none
  api_endpoint:
    model_id: meta/llama-3.1-8b-instruct
    url: https://integrate.api.nvidia.com/v1/chat/completions
    api_key_name: NGC_API_KEY

evaluation:              # Required: what benchmarks to run
  - name: gpqa_diamond
  - name: ifeval

Deployment Options#

Choose how to serve your model for evaluation:

None (External)

Use existing API endpoints like NVIDIA API Catalog, OpenAI, or custom deployments. No model deployment needed.

None Deployment
vLLM

High-performance LLM serving with advanced parallelism strategies. Best for production workloads and large models.

vLLM Deployment
SGLang

Fast serving framework optimized for structured generation and high-throughput inference with efficient memory usage.

SGLang Deployment
NIM

NVIDIA-optimized inference microservices with automatic scaling, optimization, and enterprise-grade features.

NIM Deployment
TRT-LLM

NVIDIA TensorRT LLM.

TensorRT LLM (TRT-LLM) Deployment
Generic

Deploy models using a fully custom setup.

Generic Deployment

Execution Platforms#

Choose where to run your evaluations:

Local

Docker-based evaluation on your local machine. Perfect for development, testing, and small-scale evaluations.

Local Executor
Lepton

Cloud execution with on-demand GPU provisioning. Ideal for production evaluations and scalable workloads.

Lepton Executor
SLURM

HPC cluster execution with resource management. Best for large-scale evaluations and batch processing.

Slurm Executor

Evaluation Configuration#

Tasks & Benchmarks

Configure evaluation tasks, parameter overrides, and environment variables for your benchmarks.

Evaluation Configuration

Command Line Overrides#

Override any configuration value using the -o flag:

# Basic override
nemo-evaluator-launcher run --config-name your_config \
  -o execution.output_dir=my_results

# Multiple overrides
nemo-evaluator-launcher run --config-name your_config \
  -o execution.output_dir=my_results \
  -o target.api_endpoint.url="https://new-endpoint.com/v1/chat/completions"