NeMo Evaluator#

The Core Evaluation Engine delivers standardized, reproducible AI model evaluation through containerized benchmarks and a flexible adapter architecture.

Tip

Need orchestration? For CLI and multi-backend execution, use the NeMo Evaluator Launcher.

Get Started#

Workflows

Run evaluations using pre-built containers directly or integrate them through the Python API.

Workflows

Containers

Ready-to-use evaluation containers with curated benchmarks and frameworks.

NeMo Evaluator Containers

Reference and Customization#

Interceptors

Set up interceptors to handle requests, responses, logging, caching, and custom processing.

Interceptors

Logging

Comprehensive logging setup for evaluation runs, debugging, and audit trails.

Logging Configuration

Extending

Add custom benchmarks and frameworks by defining configuration and interfaces.

Extending NeMo Evaluator

Python API Reference

Python API documentation for programmatic evaluation control and integration.

nemo_evaluator.api

CLI Reference

Command-line interface for direct container and evaluation execution.

NeMo Evaluator CLI Reference (nemo-evaluator)