Container Environments#
Deploy NeMo Curator in containerized environments for reproducible, scalable data curation pipelines with pre-configured dependencies and optimized runtime settings.
Overview#
NeMo Curator provides official Docker containers with all dependencies pre-installed and optimized for production workloads. Containers offer:
Reproducible Environments: Consistent software stack across development, testing, and production
Simplified Deployment: No manual dependency installation or environment configuration
GPU Acceleration: Pre-configured CUDA, cuDNN, and NVIDIA libraries for optimal performance
Multi-Modal Support: Built-in support for text, image, video, and audio curation
Cloud-Ready: Compatible with Kubernetes, Docker Swarm, and cloud container orchestries
When to use containers:
Production deployments requiring consistency and reliability
Multi-node cluster processing with identical environments
CI/CD pipelines for automated data curation workflows
Quick prototyping without local environment setup
GPU-accelerated processing in cloud environments
Available Containers#
Main NeMo Curator Container#
The primary container includes comprehensive support for all curation modalities:
Container registry: nvcr.io/nvidia/nemo-curator:25.09
Supported modalities:
✅ Text curation (CPU/GPU)
✅ Image curation (GPU required)
✅ Video curation (GPU required, FFmpeg included)
✅ Audio curation (GPU required for ASR)
Pre-installed components:
NeMo Curator with all optional dependencies (
[all]extras)CUDA 12.8.1 with cuDNN
Python 3.12 with uv package manager
FFmpeg 7+ with NVENC support (for video processing)
Ray, Dask, and distributed computing frameworks
NVIDIA optimized Python packages
Curator Environment#
Property |
Value |
|---|---|
Python Version |
3.12 |
CUDA Version |
12.8.1 (configurable) |
Operating System |
Ubuntu 24.04 (configurable) |
Base Image |
|
Package Manager |
uv (Ultrafast Python package installer) |
Installation |
NeMo Curator installed with all optional dependencies ( |
Environment Path |
Virtual environment activated by default: |
Container Build Arguments#
The main container accepts these build-time arguments for environment customization:
Argument |
Default |
Description |
|---|---|---|
|
|
CUDA version |
|
|
Base OS version |
|
|
Curator environment type |
|
|
InternVideo commit hash for video curation |
|
|
NVIDIA build identifier |
|
- |
NVIDIA build reference |
Environment Usage Examples#
Text Curation#
Uses the default container environment with CPU or GPU workers depending on the module.
Image Curation#
Requires GPU-enabled workers in the container environment.