> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

> Complete installation guide for NeMo Curator with system requirements, package extras, verification steps, and troubleshooting

# Install (All Modalities)

This guide covers installing NeMo Curator with support for **all modalities** and verifying your installation is working correctly. For a single-modality install or a 30-minute walkthrough, start with one of the [modality quickstarts](/get-started) instead.

## Before You Start

### System Requirements

For comprehensive system requirements and production deployment specifications, refer to [Production Deployment Requirements](/admin/deployment/requirements).

**Quick Start Requirements:**

* **OS**: Ubuntu 24.04/22.04/20.04 (recommended)
* **Python**: 3.10, 3.11, or 3.12
* **Memory**: 16GB+ RAM for basic text processing
* **GPU** (optional): NVIDIA GPU with 16GB+ VRAM for acceleration
* **CUDA 12** (required for `audio_cuda12`, `video_cuda12`, `image_cuda12`, and `text_cuda12` extras)

**Python 3.10 support will be removed in NeMo Curator 26.06.** 26.04 is the last release to support Python 3.10. If you are setting up a new environment, install a newer supported Python version (3.11+) so you do not need to upgrade when moving to 26.06. See the [26.04 release notes](/about/release-notes) for details.

### Development vs Production

| Use Case                | Requirements                              | See                                                       |
| ----------------------- | ----------------------------------------- | --------------------------------------------------------- |
| **Local Development**   | Minimum specs listed above                | Continue below                                            |
| **Production Clusters** | Detailed hardware, network, storage specs | [Deployment Requirements](/admin/deployment/requirements) |
| **Multi-node Setup**    | Advanced infrastructure planning          | [Deployment Options](/admin/deployment)                   |

***

## Installation Methods

Choose one of the following installation methods based on your needs:

**Docker is the recommended installation method** for video and audio workflows. The NeMo Curator container includes FFmpeg (with NVENC support) pre-configured, avoiding manual dependency setup. Refer to the [Container Installation](#container-installation) tab below.

Install NeMo Curator from the Python Package Index using `uv` for proper dependency resolution.

1. Install uv:

   ```bash
   curl -LsSf https://astral.sh/uv/0.8.22/install.sh | sh
   source $HOME/.local/bin/env
   ```

2. Create and activate a virtual environment:

   ```bash
   uv venv
   source .venv/bin/activate
   ```

3. Install NeMo Curator:

   ```bash
   uv pip install torch wheel_stub psutil setuptools setuptools_scm
   echo "transformers==4.55.2" > override.txt
   uv pip install --no-build-isolation "nemo-curator[all]" --override override.txt
   ```

Install the latest development version directly from GitHub:

```bash
# Clone the repository
git clone https://github.com/NVIDIA-NeMo/Curator.git
cd Curator

# Install uv if not already available
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install with all extras using uv
uv sync --all-extras --all-groups
```

NeMo Curator is available as a standalone container on NGC: [https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator). The container includes NeMo Curator with all dependencies pre-installed, including FFmpeg with NVENC support.

```bash
# Pull the container from NGC
docker pull nvcr.io/nvidia/nemo-curator:{{ container_version }}

# Run the container with GPU support
docker run --gpus all -it --rm nvcr.io/nvidia/nemo-curator:{{ container_version }}
```

After entering the container, activate the virtual environment before running any NeMo Curator commands:

source /opt/venv/env.sh

The container uses a virtual environment at `/opt/venv`. If you see `No module named nemo_curator`, the environment has not been activated.

Alternatively, you can build the NeMo Curator container locally using the provided Dockerfile:

```bash
# Build the container locally
git clone https://github.com/NVIDIA-NeMo/Curator.git
cd Curator
docker build -t nemo-curator:latest -f docker/Dockerfile .

# Run the container with GPU support
docker run --gpus all -it --rm nemo-curator:latest
```

**Benefits:**

* Pre-configured environment with all dependencies (FFmpeg, CUDA libraries)
* Consistent runtime across different systems
* Ideal for production deployments

### Install FFmpeg and Encoders (Required for Video)

Curator’s video pipelines rely on `FFmpeg` for decoding and encoding. If you plan to encode clips (for example, using `--transcode-encoder libopenh264` or `h264_nvenc`), install `FFmpeg` with the corresponding encoders.

Use the maintained script in the repository to build and install `FFmpeg` with `libopenh264` and NVIDIA NVENC support. The script enables `--enable-libopenh264`, `--enable-cuda-nvcc`, and `--enable-libnpp`.

* Script source: [docker/common/install\_ffmpeg.sh](https://github.com/NVIDIA-NeMo/Curator/blob/main/docker/common/install_ffmpeg.sh)

```bash
curl -fsSL https://raw.githubusercontent.com/NVIDIA-NeMo/Curator/main/docker/common/install_ffmpeg.sh -o install_ffmpeg.sh
chmod +x install_ffmpeg.sh
sudo bash install_ffmpeg.sh
```

Confirm that `FFmpeg` is on your `PATH` and that at least one H.264 encoder is available:

```bash
ffmpeg -hide_banner -version | head -n 5
ffmpeg -encoders | grep -E "h264_nvenc|libopenh264|libx264" | cat
```

If encoders are missing, reinstall `FFmpeg` with the required options or use the Debian/Ubuntu script above.

**FFmpeg build requires CUDA toolkit (nvcc):** If you encounter `ERROR: failed checking for nvcc` during FFmpeg installation, ensure that the CUDA toolkit is installed and `nvcc` is available on your `PATH`. You can verify with `nvcc --version`. If using the NeMo Curator container, FFmpeg is pre-installed with NVENC support.

***

## Package Extras

NeMo Curator provides several installation extras to install only the components you need:

| Extra                 | Installation Command                                             | Description                                                                                                              |
| --------------------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| **text\_cpu**         | `uv pip install nemo-curator[text_cpu]`                          | CPU-only text processing and filtering                                                                                   |
| **text\_cuda12**      | `uv pip install nemo-curator[text_cuda12]`                       | GPU-accelerated text processing with RAPIDS                                                                              |
| **audio\_cpu**        | `uv pip install nemo-curator[audio_cpu]`                         | CPU-only audio curation with NeMo Toolkit ASR                                                                            |
| **audio\_cuda12**     | `uv pip install nemo-curator[audio_cuda12]`                      | GPU-accelerated audio curation. When using `uv`, requires `transformers==4.55.2` override.                               |
| **image\_cpu**        | `uv pip install nemo-curator[image_cpu]`                         | CPU-only image processing                                                                                                |
| **image\_cuda12**     | `uv pip install nemo-curator[image_cuda12]`                      | GPU-accelerated image processing with NVIDIA DALI                                                                        |
| **video\_cpu**        | `uv pip install nemo-curator[video_cpu]`                         | CPU-only video processing                                                                                                |
| **video\_cuda12**     | `uv pip install --no-build-isolation nemo-curator[video_cuda12]` | GPU-accelerated video processing with CUDA libraries. Requires FFmpeg and additional build dependencies when using `uv`. |
| **inference\_server** | `uv pip install nemo-curator[inference_server]`                  | Ray Serve + vLLM for serving LLMs alongside curation pipelines                                                           |
| **sdg\_cpu**          | `uv pip install nemo-curator[sdg_cpu]`                           | CPU-only synthetic data generation with Data Designer                                                                    |
| **sdg\_cuda12**       | `uv pip install nemo-curator[sdg_cuda12]`                        | GPU-accelerated synthetic data generation with local inference server support                                            |

**Development Dependencies**: For development tools (pre-commit, ruff, pytest), use `uv sync --group dev --group linting --group test` instead of pip extras. Development dependencies are managed as dependency groups, not optional dependencies.

**pip is not supported for installing all extras together.** Some optional dependencies have conflicting transitive version requirements (for example, `nemo-toolkit[asr]` and `vllm` require incompatible versions of `transformers`). NeMo Curator uses [uv dependency overrides](https://docs.astral.sh/uv/concepts/dependencies/#dependency-overrides) to resolve these conflicts, which pip does not support. If you must use pip, install only one modality extra at a time (for example, `pip install nemo-curator[text_cpu]`). For multi-modality installations, use `uv` or the [NeMo Curator container](#container-installation).

***

## Installation Verification

After installation, verify that NeMo Curator is working correctly:

### 1. Basic Import Test

```python
# Test basic imports
import nemo_curator
print(f"NeMo Curator version: {nemo_curator.__version__}")

# Test core modules
from nemo_curator.pipeline import Pipeline
from nemo_curator.tasks import DocumentBatch
print("✓ Core modules imported successfully")
```

### 2. GPU Availability Check

If you installed GPU support, verify GPU access:

```python
# Check GPU availability
try:
    import torch
    if torch.cuda.is_available():
        print(f"✓ GPU available: {torch.cuda.get_device_name(0)}")
        print(f"✓ GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    else:
        print("⚠ No GPU detected")
    
    # Check cuDF for GPU deduplication
    import cudf
    print("✓ cuDF available for GPU-accelerated deduplication")
except ImportError as e:
    print(f"⚠ Some GPU modules not available: {e}")
```

### 3. Run a Quickstart Tutorial

Try a modality-specific quickstart to see NeMo Curator in action:

* [Text Curation Quickstart](/get-started/text) - Set up and run your first text curation pipeline
* [Audio Curation Quickstart](/get-started/audio) - Get started with audio dataset curation
* [Image Curation Quickstart](/get-started/image) - Curate image-text datasets for generative models
* [Video Curation Quickstart](/get-started/video) - Split, encode, and curate video clips at scale