Installation Guide
This guide covers installing NeMo Curator with support for all modalities and verifying your installation is working correctly.
Before You Start
System Requirements
For comprehensive system requirements and production deployment specifications, refer to Production Deployment Requirements.
Quick Start Requirements:
- OS: Ubuntu 24.04/22.04/20.04 (recommended)
- Python: 3.10, 3.11, or 3.12
- Memory: 16GB+ RAM for basic text processing
- GPU (optional): NVIDIA GPU with 16GB+ VRAM for acceleration
- CUDA 12 (required for
audio_cuda12,video_cuda12,image_cuda12, andtext_cuda12extras)
Development vs Production
Installation Methods
Choose one of the following installation methods based on your needs:
Docker is the recommended installation method for video and audio workflows. The NeMo Curator container includes FFmpeg (with NVENC support) pre-configured, avoiding manual dependency setup. Refer to the Container Installation tab below.
PyPI Installation
Source Installation
Container Installation (Recommended for Video/Audio)
Install NeMo Curator from the Python Package Index using uv for proper dependency resolution.
-
Install uv:
-
Create and activate a virtual environment:
-
Install NeMo Curator:
Install FFmpeg and Encoders (Required for Video)
Curator’s video pipelines rely on FFmpeg for decoding and encoding. If you plan to encode clips (for example, using --transcode-encoder libopenh264 or h264_nvenc), install FFmpeg with the corresponding encoders.
Debian/Ubuntu (Script)
Verify Installation
Use the maintained script in the repository to build and install FFmpeg with libopenh264 and NVIDIA NVENC support. The script enables --enable-libopenh264, --enable-cuda-nvcc, and --enable-libnpp.
- Script source: docker/common/install_ffmpeg.sh
FFmpeg build requires CUDA toolkit (nvcc): If you encounter ERROR: failed checking for nvcc during FFmpeg installation, ensure that the CUDA toolkit is installed and nvcc is available on your PATH. You can verify with nvcc --version. If using the NeMo Curator container, FFmpeg is pre-installed with NVENC support.
Package Extras
NeMo Curator provides several installation extras to install only the components you need:
Development Dependencies: For development tools (pre-commit, ruff, pytest), use uv sync --group dev --group linting --group test instead of pip extras. Development dependencies are managed as dependency groups, not optional dependencies.
Installation Verification
After installation, verify that NeMo Curator is working correctly:
1. Basic Import Test
2. GPU Availability Check
If you installed GPU support, verify GPU access:
3. Run a Quickstart Tutorial
Try a modality-specific quickstart to see NeMo Curator in action:
- Text Curation Quickstart - Set up and run your first text curation pipeline
- Audio Curation Quickstart - Get started with audio dataset curation
- Image Curation Quickstart - Curate image-text datasets for generative models
- Video Curation Quickstart - Split, encode, and curate video clips at scale