NeMo Evaluator Launcher#

The Orchestration Layer empowers you to run AI model evaluations at scale. Use the unified CLI and programmatic interfaces to discover benchmarks, configure runs, submit jobs, monitor progress, and export results.

Tip

New to evaluation? Start with NeMo Evaluator Launcher for a step-by-step walkthrough.

Get Started#

Quickstart

Step-by-step guide to install, configure, and run your first evaluation in minutes.

NeMo Evaluator Launcher
Configuration

Complete configuration schema, examples, and advanced patterns for all use cases.

Configuration

Execution#

Executors

Execute evaluations on your local machine, HPC cluster (Slurm), or cloud platform (Lepton AI).

Executors
Local Executor

Docker-based evaluation on your workstation. Perfect for development and testing.

Local Executor
Slurm Executor

HPC cluster execution with automatic resource management and job scheduling.

Slurm Executor
Lepton Executor

Cloud execution with on-demand GPU provisioning and automatic scaling.

Lepton Executor

Export#

Exporters

Export results to MLflow, Weights & Biases, Google Sheets, or local files with one command.

Exporters
MLflow Export

Export evaluation results and metrics to MLflow for experiment tracking.

MLflow Exporter (mlflow)
W&B Export

Integrate with Weights & Biases for advanced visualization and collaboration.

Weights & Biases Exporter (wandb)
Sheets Export

Export to Google Sheets for easy sharing and analysis with stakeholders.

Google Sheets Exporter (gsheets)

References#

Python API

Programmatic access for notebooks, automation, and custom evaluation workflows.

Python API
CLI Reference

Complete command-line interface documentation with examples and usage patterns.

NeMo Evaluator Launcher CLI Reference (nemo-evaluator-launcher)

Typical Workflow#

  1. Choose execution backend (local, Slurm, Lepton AI)

  2. Select example configuration from the examples directory

  3. Point to your model endpoint (OpenAI-compatible API)

  4. Launch evaluation via CLI or Python API

  5. Monitor progress and export results to your preferred platform

When to Use the Launcher#

Use the launcher whenever you want:

  • Unified interface for running evaluations across different backends

  • Multi-benchmark coordination with concurrent execution

  • Turnkey reproducibility with saved configurations

  • Easy result export to MLOps platforms and dashboards

  • Production-ready orchestration with monitoring and lifecycle management