For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Deploy NeMo Curator in containerized environments for reproducible, scalable data curation pipelines with pre-configured dependencies and optimized runtime settings.
Overview
NeMo Curator provides official Docker containers with all dependencies pre-installed and optimized for production workloads. Containers offer:
Reproducible Environments: Consistent software stack across development, testing, and production
Simplified Deployment: No manual dependency installation or environment configuration
GPU Acceleration: Pre-configured CUDA, cuDNN, and NVIDIA libraries for optimal performance
Multi-Modal Support: Built-in support for text, image, video, and audio curation
Cloud-Ready: Compatible with Kubernetes, Docker Swarm, and cloud container orchestries
When to use containers:
Production deployments requiring consistency and reliability
Multi-node cluster processing with identical environments
CI/CD pipelines for automated data curation workflows
Quick prototyping without local environment setup
GPU-accelerated processing in cloud environments
Available Containers
Main NeMo Curator Container
The primary container includes comprehensive support for all curation modalities:
NeMo Curator with all optional dependencies ([all] extras)
CUDA 12.8.1 with cuDNN
Python 3.12 with uv package manager
FFmpeg 8+ with NVENC support (for video processing)
Ray, Dask, and distributed computing frameworks
NVIDIA optimized Python packages
Curator Environment
Property
Value
Python Version
3.12
CUDA Version
12.8.1 (configurable)
Operating System
Ubuntu 24.04 (configurable)
Base Image
nvidia/cuda:${CUDA_VER}-cudnn-devel-${LINUX_VER}
Package Manager
uv (Ultrafast Python package installer)
Installation
NeMo Curator installed with all optional dependencies ([all] extras) using uv with NVIDIA index
Environment Path
Virtual environment at /opt/venv. Activate with source /opt/venv/env.sh after entering the container.
Security Hardening
The container build includes the following security measures:
ray_dist.jar removal: Ray’s Java support JAR is deleted during the build to remove a bundled jackson-core library affected by GHSA-72hv-8253-57qq (DoS via async JSON parser). NeMo Curator does not use Ray’s Java support, so this has no functional impact. A build-time verification guard fails the build if the JAR is not successfully removed.
Container Build Arguments
The main container accepts these build-time arguments for environment customization:
Argument
Default
Description
CUDA_VER
12.8.1
CUDA version
LINUX_VER
ubuntu24.04
Base OS version
CURATOR_ENV
ci
Curator environment type
NVIDIA_BUILD_ID
<unknown>
NVIDIA build identifier
NVIDIA_BUILD_REF
-
NVIDIA build reference
Environment Usage Examples
Text Curation
Uses the default container environment with CPU or GPU workers depending on the module.
Image Curation
Requires GPU-enabled workers in the container environment.