For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
    • Installation
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • Before You Start
  • System Requirements
  • Development vs Production
  • Installation Methods
  • Install FFmpeg and Encoders (Required for Video)
  • Package Extras
  • Installation Verification
  • 1. Basic Import Test
  • 2. GPU Availability Check
  • 3. Run a Quickstart Tutorial
Setup & Deployment

Installation Guide

||View as Markdown|
Previous

Overview

Next

Overview

This guide covers installing NeMo Curator with support for all modalities and verifying your installation is working correctly.

Before You Start

System Requirements

For comprehensive system requirements and production deployment specifications, refer to Production Deployment Requirements.

Quick Start Requirements:

  • OS: Ubuntu 24.04/22.04/20.04 (recommended)
  • Python: 3.10, 3.11, or 3.12
  • Memory: 16GB+ RAM for basic text processing
  • GPU (optional): NVIDIA GPU with 16GB+ VRAM for acceleration
  • CUDA 12 (required for audio_cuda12, video_cuda12, image_cuda12, and text_cuda12 extras)

Development vs Production

Use CaseRequirementsSee
Local DevelopmentMinimum specs listed aboveContinue below
Production ClustersDetailed hardware, network, storage specsDeployment Requirements
Multi-node SetupAdvanced infrastructure planningDeployment Options

Installation Methods

Choose one of the following installation methods based on your needs:

Docker is the recommended installation method for video and audio workflows. The NeMo Curator container includes FFmpeg (with NVENC support) pre-configured, avoiding manual dependency setup. Refer to the Container Installation tab below.

PyPI Installation
Source Installation
Container Installation (Recommended for Video/Audio)

Install NeMo Curator from the Python Package Index using uv for proper dependency resolution.

  1. Install uv:

    $curl -LsSf https://astral.sh/uv/0.8.22/install.sh | sh
    $source $HOME/.local/bin/env
  2. Create and activate a virtual environment:

    $uv venv
    $source .venv/bin/activate
  3. Install NeMo Curator:

    $uv pip install torch wheel_stub psutil setuptools setuptools_scm
    $echo "transformers==4.55.2" > override.txt
    $uv pip install --no-build-isolation "nemo-curator[all]" --override override.txt

Install FFmpeg and Encoders (Required for Video)

Curator’s video pipelines rely on FFmpeg for decoding and encoding. If you plan to encode clips (for example, using --transcode-encoder libopenh264 or h264_nvenc), install FFmpeg with the corresponding encoders.

Debian/Ubuntu (Script)
Verify Installation

Use the maintained script in the repository to build and install FFmpeg with libopenh264 and NVIDIA NVENC support. The script enables --enable-libopenh264, --enable-cuda-nvcc, and --enable-libnpp.

  • Script source: docker/common/install_ffmpeg.sh
$curl -fsSL https://raw.githubusercontent.com/NVIDIA-NeMo/Curator/main/docker/common/install_ffmpeg.sh -o install_ffmpeg.sh
$chmod +x install_ffmpeg.sh
$sudo bash install_ffmpeg.sh

FFmpeg build requires CUDA toolkit (nvcc): If you encounter ERROR: failed checking for nvcc during FFmpeg installation, ensure that the CUDA toolkit is installed and nvcc is available on your PATH. You can verify with nvcc --version. If using the NeMo Curator container, FFmpeg is pre-installed with NVENC support.


Package Extras

NeMo Curator provides several installation extras to install only the components you need:

ExtraInstallation CommandDescription
text_cpuuv pip install nemo-curator[text_cpu]CPU-only text processing and filtering
text_cuda12uv pip install nemo-curator[text_cuda12]GPU-accelerated text processing with RAPIDS
audio_cpuuv pip install nemo-curator[audio_cpu]CPU-only audio curation with NeMo Toolkit ASR
audio_cuda12uv pip install nemo-curator[audio_cuda12]GPU-accelerated audio curation. When using uv, requires transformers==4.55.2 override.
image_cpuuv pip install nemo-curator[image_cpu]CPU-only image processing
image_cuda12uv pip install nemo-curator[image_cuda12]GPU-accelerated image processing with NVIDIA DALI
video_cpuuv pip install nemo-curator[video_cpu]CPU-only video processing
video_cuda12uv pip install --no-build-isolation nemo-curator[video_cuda12]GPU-accelerated video processing with CUDA libraries. Requires FFmpeg and additional build dependencies when using uv.

Development Dependencies: For development tools (pre-commit, ruff, pytest), use uv sync --group dev --group linting --group test instead of pip extras. Development dependencies are managed as dependency groups, not optional dependencies.


Installation Verification

After installation, verify that NeMo Curator is working correctly:

1. Basic Import Test

1# Test basic imports
2import nemo_curator
3print(f"NeMo Curator version: {nemo_curator.__version__}")
4
5# Test core modules
6from nemo_curator.pipeline import Pipeline
7from nemo_curator.tasks import DocumentBatch
8print("✓ Core modules imported successfully")

2. GPU Availability Check

If you installed GPU support, verify GPU access:

1# Check GPU availability
2try:
3 import torch
4 if torch.cuda.is_available():
5 print(f"✓ GPU available: {torch.cuda.get_device_name(0)}")
6 print(f"✓ GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
7 else:
8 print("⚠ No GPU detected")
9
10 # Check cuDF for GPU deduplication
11 import cudf
12 print("✓ cuDF available for GPU-accelerated deduplication")
13except ImportError as e:
14 print(f"⚠ Some GPU modules not available: {e}")

3. Run a Quickstart Tutorial

Try a modality-specific quickstart to see NeMo Curator in action:

  • Text Curation Quickstart - Set up and run your first text curation pipeline
  • Audio Curation Quickstart - Get started with audio dataset curation
  • Image Curation Quickstart - Curate image-text datasets for generative models
  • Video Curation Quickstart - Split, encode, and curate video clips at scale