> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

> Reference documentation for container environments, configurations, and deployment variables in NeMo Curator

# Container Environments

Deploy NeMo Curator in containerized environments for reproducible, scalable data curation pipelines with pre-configured dependencies and optimized runtime settings.

## Overview

NeMo Curator provides official Docker containers with all dependencies pre-installed and optimized for production workloads. Containers offer:

* **Reproducible Environments**: Consistent software stack across development, testing, and production
* **Simplified Deployment**: No manual dependency installation or environment configuration
* **GPU Acceleration**: Pre-configured CUDA, cuDNN, and NVIDIA libraries for optimal performance
* **Multi-Modal Support**: Built-in support for text, image, video, and audio curation
* **Cloud-Ready**: Compatible with Kubernetes, Docker Swarm, and cloud container orchestries

**When to use containers:**

* Production deployments requiring consistency and reliability
* Multi-node cluster processing with identical environments
* CI/CD pipelines for automated data curation workflows
* Quick prototyping without local environment setup
* GPU-accelerated processing in cloud environments

## Available Containers

### Main NeMo Curator Container

The primary container includes comprehensive support for all curation modalities:

**Container registry:** `nvcr.io/nvidia/nemo-curator:{{ container_version }}`

**Supported modalities:**

* ✅ Text curation (CPU/GPU)
* ✅ Image curation (GPU required)
* ✅ Video curation (GPU required, FFmpeg included)
* ✅ Audio curation (GPU required for ASR)

**Pre-installed components:**

* NeMo Curator with all optional dependencies (`[all]` extras)
* CUDA 12.8.1 with cuDNN
* Python 3.12 with uv package manager
* FFmpeg 8+ with NVENC support (for video processing)
* Ray, Dask, and distributed computing frameworks
* NVIDIA optimized Python packages

### Curator Environment

| Property         | Value                                                                                                     |
| ---------------- | --------------------------------------------------------------------------------------------------------- |
| Python Version   | 3.12                                                                                                      |
| CUDA Version     | 12.8.1 (configurable)                                                                                     |
| Operating System | Ubuntu 24.04 (configurable)                                                                               |
| Base Image       | `nvidia/cuda:${CUDA_VER}-cudnn-devel-${LINUX_VER}`                                                        |
| Package Manager  | uv (Ultrafast Python package installer)                                                                   |
| Installation     | NeMo Curator installed with all optional dependencies (`[all]` extras) using uv with NVIDIA index         |
| Environment Path | Virtual environment at `/opt/venv`. Activate with `source /opt/venv/env.sh` after entering the container. |

***

## Security Hardening

The container build includes the following security measures:

* **`ray_dist.jar` removal**: Ray's Java support JAR is deleted during the build to remove a bundled jackson-core library affected by [GHSA-72hv-8253-57qq](https://github.com/advisories/GHSA-72hv-8253-57qq) (DoS via async JSON parser). NeMo Curator does not use Ray's Java support, so this has no functional impact. A build-time verification guard fails the build if the JAR is not successfully removed.

***

## Container Build Arguments

The main container accepts these build-time arguments for environment customization:

| Argument           | Default       | Description              |
| ------------------ | ------------- | ------------------------ |
| `CUDA_VER`         | `12.8.1`      | CUDA version             |
| `LINUX_VER`        | `ubuntu24.04` | Base OS version          |
| `CURATOR_ENV`      | `ci`          | Curator environment type |
| `NVIDIA_BUILD_ID`  | `<unknown>`   | NVIDIA build identifier  |
| `NVIDIA_BUILD_REF` | -             | NVIDIA build reference   |

***

## Environment Usage Examples

### Text Curation

Uses the default container environment with CPU or GPU workers depending on the module.

### Image Curation

Requires GPU-enabled workers in the container environment.