NeMo Evaluator SDK Documentation#

Welcome to the NeMo Evaluator SDK Documentation.


Introduction to NeMo Evaluator SDK#

Discover how NeMo Evaluator SDK works and explore its key features.

About NeMo Evaluator SDK

Explore the NeMo Evaluator Core and Launcher architecture

About NeMo Evaluator SDK
Key Features

Discover NeMo Evaluator SDK’s powerful capabilities.

Key Features
Concepts

Master core concepts powering NeMo Evaluator SDK.

Concepts
Release Notes

Release notes for the NeMo Evaluator SDK.

Release Notes

Choose a Quickstart#

Select the evaluation approach that best fits your workflow and technical requirements.

Launcher

Use the CLI to orchestrate evaluations with automated container management.

NeMo Evaluator Launcher
Core

Get direct Python API access with full adapter features, custom configurations, and workflow integration capabilities.

NeMo Evaluator Core
Container

Gain full control over the container environment with volume mounting, environment variable management, and integration into Docker-based CI/CD pipelines.

Container Direct

Libraries#

Launcher#

Orchestrate evaluations across different execution backends with unified CLI and programmatic interfaces.

Configuration

Complete configuration schema, examples, and advanced patterns for all use cases.

Configuration
Executors

Run evaluations on local machines, HPC clusters (Slurm), or cloud platforms (Lepton AI).

Executors
Exporters

Export results to MLflow, Weights & Biases, Google Sheets, or local files with one command.

Exporters
Python API

Programmatic access for notebooks, automation, and custom evaluation workflows.

Python API
CLI Reference

Complete command-line interface documentation with examples and usage patterns.

NeMo Evaluator Launcher CLI Reference (nemo-evaluator-launcher)

Core#

Access the core evaluation engine directly with containerized benchmarks and flexible adapter architecture.

Workflows

Use the evaluation engine through Python API, containers, or programmatic workflows.

Workflows
Containers

Ready-to-use evaluation containers with curated benchmarks and frameworks.

NeMo Evaluator Containers
Interceptors

Configure request/response interceptors for logging, caching, and custom processing.

Interceptors
Logging

Comprehensive logging setup for evaluation runs, debugging, and audit trails.

Logging Configuration
Extending

Add custom benchmarks and frameworks by defining configuration and interfaces.

Extending NeMo Evaluator
API Reference

Python API documentation for programmatic evaluation control and integration.

API Reference