NeMo Evaluator Containers#

NeMo Evaluator provides a collection of specialized containers for different evaluation frameworks and tasks. Each container is optimized and tested to work seamlessly with NVIDIA hardware and software stack, providing consistent, reproducible environments for AI model evaluation.

Container Categories#

Language Models

Containers for evaluating large language models across academic benchmarks and custom tasks.

Language Model Containers
Code Generation

Specialized containers for evaluating code generation and programming capabilities.

Code Generation Containers
Vision-Language

Multimodal evaluation containers for vision-language understanding and reasoning.

Vision-Language Containers
Safety & Security

Containers focused on safety evaluation, bias detection, and security testing.

Safety and Security Containers

Quick Start#

Basic Container Usage#

# Pull a container
docker pull nvcr.io/nvidia/eval-factory/<container-name>:<tag>

# Example: Pull simple-evals container
docker pull nvcr.io/nvidia/eval-factory/simple-evals:25.09

# Run with GPU support
docker run -it nvcr.io/nvidia/eval-factory/<container-name>:<tag>

Prerequisites#

  • Docker and NVIDIA Container Toolkit (for GPU support)

  • NVIDIA GPU (for GPU-accelerated evaluation)

  • Sufficient disk space for models and datasets

For detailed usage instructions, refer to the CLI Workflows guide.