NeMo Evaluator Containers#

NeMo Evaluator provides a collection of specialized containers for different evaluation frameworks and tasks. Each container is optimized and tested to work seamlessly with NVIDIA hardware and software stack, providing consistent, reproducible environments for AI model evaluation.

Container Categories#

Language Models

Containers for evaluating large language models across academic benchmarks and custom tasks.

Language Model Containers

Code Generation

Specialized containers for evaluating code generation and programming capabilities.

Code Generation Containers

Vision-Language

Multimodal evaluation containers for vision-language understanding and reasoning.

Vision-Language Containers

Safety & Security

Containers focused on safety evaluation, bias detection, and security testing.

Safety and Security Containers

Specialized Tools

Containers focused on agentic AI capabilities and advanced reasoning.

Specialized Tools Containers

Efficiency

Containers for evaluating speed of input processing and output generation.

Model Efficiency

Quick Start#

Basic Container Usage#

# Pull a container
docker pull nvcr.io/nvidia/eval-factory/<container-name>:<tag>

# Example: Pull simple-evals container
docker pull nvcr.io/nvidia/eval-factory/simple-evals:25.10

# Run with GPU support
docker run -it nvcr.io/nvidia/eval-factory/<container-name>:<tag>

Prerequisites#

Docker and NVIDIA Container Toolkit (for GPU support)
NVIDIA GPU (for GPU-accelerated evaluation)
Sufficient disk space for models and datasets

For detailed usage instructions, refer to the CLI Workflows guide.