NeMo Evaluator Containers#
NeMo Evaluator provides a collection of specialized containers for different evaluation frameworks and tasks. Each container is optimized and tested to work seamlessly with NVIDIA hardware and software stack, providing consistent, reproducible environments for AI model evaluation.
Container Categories#
Containers for evaluating large language models across academic benchmarks and custom tasks.
Specialized containers for evaluating code generation and programming capabilities.
Multimodal evaluation containers for vision-language understanding and reasoning.
Containers focused on safety evaluation, bias detection, and security testing.
Quick Start#
Basic Container Usage#
# Pull a container
docker pull nvcr.io/nvidia/eval-factory/<container-name>:<tag>
# Example: Pull simple-evals container
docker pull nvcr.io/nvidia/eval-factory/simple-evals:25.09
# Run with GPU support
docker run -it nvcr.io/nvidia/eval-factory/<container-name>:<tag>
Prerequisites#
Docker and NVIDIA Container Toolkit (for GPU support)
NVIDIA GPU (for GPU-accelerated evaluation)
Sufficient disk space for models and datasets
For detailed usage instructions, refer to the CLI Workflows guide.