NeMo Evaluator SDK Documentation#

Welcome to the NeMo Evaluator SDK Documentation.

Introduction to NeMo Evaluator SDK#

Discover how NeMo Evaluator SDK works and explore its key features.

About NeMo Evaluator SDK

Explore the NeMo Evaluator Core and Launcher architecture

About NeMo Evaluator SDK

Key Features

Discover NeMo Evaluator SDK’s powerful capabilities.

Key Features

Concepts

Master core concepts powering NeMo Evaluator SDK.

Concepts

Release Notes

Release notes for the NeMo Evaluator SDK.

Release Notes

Choose a Quickstart#

Select the evaluation approach that best fits your workflow and technical requirements.

Launcher

Use the CLI to orchestrate evaluations with automated container management.

cli

NeMo Evaluator Launcher

Core

Get direct Python API access with full adapter features, custom configurations, and workflow integration capabilities.

api

NeMo Evaluator Core

Container

Gain full control over the container environment with volume mounting, environment variable management, and integration into Docker-based CI/CD pipelines.

Docker

Container Direct

Libraries#

Launcher#

Orchestrate evaluations across different execution backends with unified CLI and programmatic interfaces.

Configuration

Complete configuration schema, examples, and advanced patterns for all use cases.

Setup

Configuration

Executors

Run evaluations on local machines, HPC clusters (Slurm), or cloud platforms (Lepton AI).

Execution

Executors

Exporters

Export results to MLflow, Weights & Biases, Google Sheets, or local files with one command.

Export

Exporters

Core#

Access the core evaluation engine directly with containerized benchmarks and flexible adapter architecture.

Workflows

Use the evaluation engine through Python API, containers, or programmatic workflows.

Integration

Workflows

Containers

Ready-to-use evaluation containers with curated benchmarks and frameworks.

Containers

NeMo Evaluator Containers

Interceptors

Configure request/response interceptors for logging, caching, and custom processing.

Customization

Interceptors

Logging

Comprehensive logging setup for evaluation runs, debugging, and audit trails.

Monitoring

Logging Configuration

Extending

Add custom benchmarks and frameworks by defining configuration and interfaces.

Extension

Extending NeMo Evaluator

API Reference

Python API documentation for programmatic evaluation control and integration.

API

API Reference