NeMo Evaluator#

The Core Evaluation Engine delivers standardized, reproducible AI model evaluation through containerized benchmarks and a flexible adapter architecture.

Tip

Need orchestration? For CLI and multi-backend execution, use the NeMo Evaluator Launcher.

Get Started#

Workflows

Run evaluations using pre-built containers directly or integrate them through the Python API.

Workflows
Containers

Ready-to-use evaluation containers with curated benchmarks and frameworks.

NeMo Evaluator Containers

Reference and Customization#

Interceptors

Set up interceptors to handle requests, responses, logging, caching, and custom processing.

Interceptors
Logging

Comprehensive logging setup for evaluation runs, debugging, and audit trails.

Logging Configuration
Extending

Add custom benchmarks and frameworks by defining configuration and interfaces.

Extending NeMo Evaluator
API Reference

Python API documentation for programmatic evaluation control and integration.

API Reference
CLI Reference

Command-line interface for direct container and evaluation execution.

NeMo Evaluator CLI Reference (nemo-evaluator)