Container Environments
Deploy NeMo Curator in containerized environments for reproducible, scalable data curation pipelines with pre-configured dependencies and optimized runtime settings.
Overview
NeMo Curator provides official Docker containers with all dependencies pre-installed and optimized for production workloads. Containers offer:
- Reproducible Environments: Consistent software stack across development, testing, and production
- Simplified Deployment: No manual dependency installation or environment configuration
- GPU Acceleration: Pre-configured CUDA, cuDNN, and NVIDIA libraries for optimal performance
- Multi-Modal Support: Built-in support for text, image, video, and audio curation
- Cloud-Ready: Compatible with Kubernetes, Docker Swarm, and cloud container orchestries
When to use containers:
- Production deployments requiring consistency and reliability
- Multi-node cluster processing with identical environments
- CI/CD pipelines for automated data curation workflows
- Quick prototyping without local environment setup
- GPU-accelerated processing in cloud environments
Available Containers
Main NeMo Curator Container
The primary container includes comprehensive support for all curation modalities:
Container registry: nvcr.io/nvidia/nemo-curator:{{ container_version }}
Supported modalities:
- ✅ Text curation (CPU/GPU)
- ✅ Image curation (GPU required)
- ✅ Video curation (GPU required, FFmpeg included)
- ✅ Audio curation (GPU required for ASR)
Pre-installed components:
- NeMo Curator with all optional dependencies (
[all]extras) - CUDA 12.8.1 with cuDNN
- Python 3.12 with uv package manager
- FFmpeg 8+ with NVENC support (for video processing)
- Ray, Dask, and distributed computing frameworks
- NVIDIA optimized Python packages
Curator Environment
Container Build Arguments
The main container accepts these build-time arguments for environment customization:
Environment Usage Examples
Text Curation
Uses the default container environment with CPU or GPU workers depending on the module.
Image Curation
Requires GPU-enabled workers in the container environment.