For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
This guide shows how to install Curator and run your first video curation pipeline.
The example pipeline processes a list of videos, splitting each into 10‑second clips using a fixed stride. It then generates clip‑level embeddings for downstream tasks such as duplicate removal and similarity search.
Overview
This quickstart guide demonstrates how to:
Install NeMo Curator with video processing support
Set up FFmpeg with GPU-accelerated encoding
Configure embedding models (Cosmos-Embed1 or InternVideo2)
Process videos through a complete splitting and embedding pipeline
Generate outputs ready for duplicate removal, captioning, and model training
What you’ll build: A video processing pipeline that:
Splits videos into 10-second clips using fixed stride or scene detection
Generates clip-level embeddings for similarity search and deduplication
Optionally creates captions and preview images
Outputs results in formats compatible with multimodal training workflows
Prerequisites
System Requirements
To use NeMo Curator’s video curation capabilities, ensure your system meets these requirements:
Operating System
Ubuntu 24.04, 22.04, or 20.04 (required for GPU-accelerated video processing)
Other Linux distributions may work but are not officially supported
Python Environment
Python 3.10, 3.11, or 3.12
uv package manager for dependency management
Git for model and repository dependencies
GPU Requirements
NVIDIA GPU required (CPU-only mode not supported for video processing)
Architecture: Volta™ or newer (compute capability 7.0+)
Examples: V100, T4, RTX 2080+, A100, H100
CUDA: Version 12.0 or above
VRAM: Minimum requirements by configuration:
Basic splitting + embedding: ~16GB VRAM
Full pipeline (splitting + embedding + captioning): ~38GB VRAM
GPU encoder: h264_nvenc (recommended for performance)
CPU encoders: libopenh264 or libx264 (fallback options)
If you don’t have uv installed, refer to the Installation Guide for setup instructions, or install it quickly with:
$
curl -LsSf https://astral.sh/uv/0.8.22/install.sh | sh
$
source $HOME/.local/bin/env
Install
Create and activate a virtual environment, then choose an install option:
Cosmos-Embed1 (the default) is generally better than InternVideo2 for most video embedding tasks. Consider using Cosmos-Embed1 (cosmos-embed1-224p) unless you have specific requirements for InternVideo2.
Curator’s video pipelines rely on FFmpeg for decoding and encoding. If you plan to encode clips (for example, using --transcode-encoder libopenh264 or h264_nvenc), install FFmpeg with the corresponding encoders.
Debian/Ubuntu (Script)
Verify Installation
Use the maintained script in the repository to build and install FFmpeg with libopenh264 and NVIDIA NVENC support. The script enables --enable-libopenh264, --enable-cuda-nvcc, and --enable-libnpp.
Refer to Clip Encoding to choose encoders and verify NVENC support on your system.
Available Models
Embeddings convert each video clip into a numeric vector that captures visual and semantic content. Curator uses these vectors to:
Remove near-duplicate clips during duplicate removal
Enable similarity search and clustering
Support downstream analysis such as caption verification
NeMo Curator supports two embedding model families:
Cosmos-Embed1 (Recommended)
Cosmos-Embed1 (default): Available in three variants—cosmos-embed1-224p, cosmos-embed1-336p, and cosmos-embed1-448p—which differ in input resolution and accuracy/VRAM tradeoff. All variants are automatically downloaded to MODEL_DIR on first run.