NeMo Evaluator Launcher#

The Orchestration Layer empowers you to run AI model evaluations at scale. Use the unified CLI and programmatic interfaces to discover benchmarks, configure runs, submit jobs, monitor progress, and export results.

Tip

New to evaluation? Start with NeMo Evaluator Launcher for a step-by-step walkthrough.

Get Started#

Quickstart

Step-by-step guide to install, configure, and run your first evaluation in minutes.

NeMo Evaluator Launcher

Configuration

Complete configuration schema, examples, and advanced patterns for all use cases.

Configuration

Execution#

Executors

Execute evaluations on your local machine, HPC cluster (Slurm), or cloud platform (Lepton AI).

Executors

Local Executor

Docker-based evaluation on your workstation. Perfect for development and testing.

Local Executor

Slurm Executor

HPC cluster execution with automatic resource management and job scheduling.

Slurm Executor

Lepton Executor

Cloud execution with on-demand GPU provisioning and automatic scaling.

Lepton Executor

Export#

Exporters

Export results to MLflow, Weights & Biases, Google Sheets, or local files with one command.

Exporters

MLflow Export

Export evaluation results and metrics to MLflow for experiment tracking.

MLflow Exporter (mlflow)

W&B Export

Integrate with Weights & Biases for advanced visualization and collaboration.

Weights & Biases Exporter (wandb)

Sheets Export

Export to Google Sheets for easy sharing and analysis with stakeholders.

Google Sheets Exporter (gsheets)

Typical Workflow#

Choose execution backend (local, Slurm, Lepton AI)
Select example configuration from the examples directory
Point to your model endpoint (OpenAI-compatible API)
Launch evaluation via CLI or Python API
Monitor progress and export results to your preferred platform

When to Use the Launcher#

Use the launcher whenever you want:

Unified interface for running evaluations across different backends
Multi-benchmark coordination with concurrent execution
Turnkey reproducibility with saved configurations
Easy result export to MLOps platforms and dashboards
Production-ready orchestration with monitoring and lifecycle management