Install (All Modalities)
Install (All Modalities)
This guide covers installing NeMo Curator with support for all modalities and verifying your installation is working correctly. For a single-modality install or a 30-minute walkthrough, start with one of the modality quickstarts instead.
Before You Start
System Requirements
For comprehensive system requirements and production deployment specifications, refer to Production Deployment Requirements.
Quick Start Requirements:
- OS: Ubuntu 24.04/22.04/20.04 (recommended)
- Python: 3.10, 3.11, or 3.12
- Memory: 16GB+ RAM for basic text processing
- GPU (optional): NVIDIA GPU with 16GB+ VRAM for acceleration
- CUDA 12 (required for
audio_cuda12,video_cuda12,image_cuda12, andtext_cuda12extras)
Python 3.10 support will be removed in NeMo Curator 26.06. 26.04 is the last release to support Python 3.10. If you are setting up a new environment, install a newer supported Python version (3.11+) so you do not need to upgrade when moving to 26.06. See the 26.04 release notes for details.
Development vs Production
Installation Methods
Choose one of the following installation methods based on your needs:
Docker is the recommended installation method for video and audio workflows. The NeMo Curator container includes FFmpeg (with NVENC support) pre-configured, avoiding manual dependency setup. Refer to the Container Installation tab below.
PyPI Installation
Source Installation
Container Installation (Recommended for Video/Audio)
Install NeMo Curator from the Python Package Index using uv for proper dependency resolution.
-
Install uv:
-
Create and activate a virtual environment:
-
Install NeMo Curator:
Install FFmpeg and Encoders (Required for Video)
Curator’s video pipelines rely on FFmpeg for decoding and encoding. If you plan to encode clips (for example, using --transcode-encoder libopenh264 or h264_nvenc), install FFmpeg with the corresponding encoders.
Debian/Ubuntu (Script)
Verify Installation
Use the maintained script in the repository to build and install FFmpeg with libopenh264 and NVIDIA NVENC support. The script enables --enable-libopenh264, --enable-cuda-nvcc, and --enable-libnpp.
- Script source: docker/common/install_ffmpeg.sh
FFmpeg build requires CUDA toolkit (nvcc): If you encounter ERROR: failed checking for nvcc during FFmpeg installation, ensure that the CUDA toolkit is installed and nvcc is available on your PATH. You can verify with nvcc --version. If using the NeMo Curator container, FFmpeg is pre-installed with NVENC support.
Package Extras
NeMo Curator provides several installation extras to install only the components you need:
Development Dependencies: For development tools (pre-commit, ruff, pytest), use uv sync --group dev --group linting --group test instead of pip extras. Development dependencies are managed as dependency groups, not optional dependencies.
pip is not supported for installing all extras together. Some optional dependencies have conflicting transitive version requirements (for example, nemo-toolkit[asr] and vllm require incompatible versions of transformers). NeMo Curator uses uv dependency overrides to resolve these conflicts, which pip does not support. If you must use pip, install only one modality extra at a time (for example, pip install nemo-curator[text_cpu]). For multi-modality installations, use uv or the NeMo Curator container.
Installation Verification
After installation, verify that NeMo Curator is working correctly:
1. Basic Import Test
2. GPU Availability Check
If you installed GPU support, verify GPU access:
3. Run a Quickstart Tutorial
Try a modality-specific quickstart to see NeMo Curator in action:
- Text Curation Quickstart - Set up and run your first text curation pipeline
- Audio Curation Quickstart - Get started with audio dataset curation
- Image Curation Quickstart - Curate image-text datasets for generative models
- Video Curation Quickstart - Split, encode, and curate video clips at scale