*** description: >- Step-by-step guide to installing Curator and running your first video curation pipeline categories: * getting-started tags: * video-curation * installation * quickstart * gpu-accelerated * ray * python personas: * data-scientist-focused * mle-focused difficulty: beginner content\_type: tutorial modality: video-only *** # Get Started with Video Curation This guide shows how to install Curator and run your first video curation pipeline. The [example pipeline](#run-the-splitting-pipeline-example) processes a list of videos, splitting each into 10‑second clips using a fixed stride. It then generates clip‑level embeddings for downstream tasks such as duplicate removal and similarity search. ## Overview This quickstart guide demonstrates how to: 1. **Install NeMo Curator** with video processing support 2. **Set up FFmpeg** with GPU-accelerated encoding 3. **Configure embedding models** (Cosmos-Embed1) 4. **Process videos** through a complete splitting and embedding pipeline 5. **Generate outputs** ready for duplicate removal, captioning, and model training **What you build:** A video processing pipeline that: * Splits videos into 10-second clips using fixed stride or scene detection * Generates clip-level embeddings for similarity search and deduplication * Optionally creates captions and preview images * Outputs results in formats compatible with multimodal training workflows ## Prerequisites ### System Requirements To use NeMo Curator's video curation capabilities, ensure your system meets these requirements: #### Operating System * **Ubuntu 24.04, 22.04, or 20.04** (required for GPU-accelerated video processing) * Other Linux distributions may work but are not officially supported #### Python Environment * **Python 3.10, 3.11, or 3.12** * **uv package manager** for dependency management * **Git** for model and repository dependencies #### GPU Requirements * **NVIDIA GPU required** (CPU-only mode not supported for video processing) * **Architecture**: Volta™ or newer (compute capability 7.0+) * Examples: V100, T4, RTX 2080+, A100, H100 * **CUDA**: Version 12.0 or above * **VRAM**: Minimum requirements by configuration: * Basic splitting + embedding: \~16GB VRAM * Full pipeline (splitting + embedding + captioning): \~38GB VRAM * Reduced configuration (lower batch sizes, FP8): \~21GB VRAM #### Software Dependencies * **FFmpeg 8.0+** with H.264 encoding support * GPU encoder: `h264_nvenc` (recommended for performance) * CPU encoders: `libopenh264` or `libx264` (fallback options) If `uv` is not installed, refer to the [Installation Guide](/admin/installation) for setup instructions, or install it quickly with: ```bash curl -LsSf https://astral.sh/uv/0.8.22/install.sh | sh source $HOME/.local/bin/env ``` *** ## Install Create and activate a virtual environment, then choose an install option: ```bash uv pip install torch wheel_stub psutil setuptools setuptools_scm uv pip install --no-build-isolation "nemo-curator[video_cuda12]" ``` ```bash git clone https://github.com/NVIDIA-NeMo/Curator.git cd Curator uv sync --extra video_cuda12 --all-groups source .venv/bin/activate ``` NeMo Curator is available as a standalone container: ```bash # Pull the container docker pull nvcr.io/nvidia/nemo-curator:{{ container_version }} # Run the container docker run --gpus all -it --rm nvcr.io/nvidia/nemo-curator:{{ container_version }} ``` For details on container environments and configurations, see [Container Environments](/reference/infra/container-environments). ## Install FFmpeg and Encoders Curator’s video pipelines rely on `FFmpeg` for decoding and encoding. If you plan to encode clips (for example, using `--transcode-encoder libopenh264` or `h264_nvenc`), install `FFmpeg` with the corresponding encoders. Use the maintained script in the repository to build and install `FFmpeg` with `libopenh264` and NVIDIA NVENC support. The script enables `--enable-libopenh264`, `--enable-cuda-nvcc`, and `--enable-libnpp`. * Script source: [docker/common/install\_ffmpeg.sh](https://github.com/NVIDIA-NeMo/Curator/blob/main/docker/common/install_ffmpeg.sh) ```bash curl -fsSL https://raw.githubusercontent.com/NVIDIA-NeMo/Curator/main/docker/common/install_ffmpeg.sh -o install_ffmpeg.sh chmod +x install_ffmpeg.sh sudo bash install_ffmpeg.sh ``` Confirm that `FFmpeg` is on your `PATH` and that at least one H.264 encoder is available: ```bash ffmpeg -hide_banner -version | head -n 5 ffmpeg -encoders | grep -E "h264_nvenc|libopenh264|libx264" | cat ``` If encoders are missing, reinstall `FFmpeg` with the required options or use the Debian/Ubuntu script above. Refer to [Clip Encoding](/curate-video/process-data/transcoding) to choose encoders and verify NVENC support on your system. ### Available Models Embeddings convert each video clip into a numeric vector that captures visual and semantic content. Curator uses these vectors to: * Remove near-duplicate clips during duplicate removal * Enable similarity search and clustering * Support downstream analysis such as caption verification NeMo Curator supports two embedding model families: #### Cosmos-Embed1 (Recommended) **Cosmos-Embed1 (default)**: Available in three variants—**cosmos-embed1-224p**, **cosmos-embed1-336p**, and **cosmos-embed1-448p**—which differ in input resolution and accuracy/VRAM tradeoff. All variants are automatically downloaded to `MODEL_DIR` on first run. | Model Variant | Resolution | VRAM Usage | Speed | Accuracy | Best For | | ---------------------- | ---------- | ---------- | ------- | -------- | ---------------------------------------------- | | **cosmos-embed1-224p** | 224×224 | \~8GB | Fastest | Good | Large-scale processing, initial curation | | **cosmos-embed1-336p** | 336×336 | \~12GB | Medium | Better | Balanced performance and quality | | **cosmos-embed1-448p** | 448×448 | \~16GB | Slower | Best | High-quality embeddings, fine-grained matching | **Model links:** * [cosmos-embed1-224p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-224p) * [cosmos-embed1-336p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-336p) * [cosmos-embed1-448p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-448p) For this quickstart, the following steps set up support for **Cosmos-Embed1-224p**. ### Prepare Model Weights For most use cases, you only need to create a model directory. The required model files will be downloaded automatically on first run. 1. Create a model directory: ```bash mkdir -p "$MODEL_DIR" ``` You can reuse the same `` across runs. 2. No additional setup is required. The model will be downloaded automatically when first used. ## Set Up Data Directories Organize input videos and output locations before running the pipeline. * **Local**: For local file processing. Define paths like: ```bash DATA_DIR=/path/to/videos OUT_DIR=/path/to/output_clips MODEL_DIR=/path/to/models ``` * **S3**: For cloud storage (AWS S3, MinIO, etc.). Configure credentials in `~/.aws/credentials` and use `s3://` paths for `--video-dir` and `--output-clip-path`. **S3 usage notes:** * Input videos can be read from S3 paths * Output clips can be written to S3 paths * Model directory should remain local for performance * Ensure IAM permissions allow read/write access to specified buckets ## Run the Splitting Pipeline Example Use the example script from [https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/video/getting-started](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/video/getting-started) to read videos, split into clips, and write outputs. This runs a Ray pipeline with `XennaExecutor` under the hood. ```bash python tutorials/video/getting-started/video_split_clip_example.py \ --video-dir "$DATA_DIR" \ --model-dir "$MODEL_DIR" \ --output-clip-path "$OUT_DIR" \ --splitting-algorithm fixed_stride \ --fixed-stride-split-duration 10.0 \ --embedding-algorithm cosmos-embed1-224p \ --transcode-encoder libopenh264 \ --verbose ``` **What this command does:** 1. Reads all video files from `$DATA_DIR` 2. Splits each video into 10-second clips using fixed stride 3. Generates embeddings using Cosmos-Embed1-224p model 4. Encodes clips using libopenh264 codec 5. Writes output clips and metadata to `$OUT_DIR` **Using a config file**: The example script accepts many command-line arguments. For complex configurations, you can store arguments in a file and pass them with the `@` prefix: echo '--video-dir /data/videos \--output-clip-path /data/output \--splitting-algorithm fixed\_stride \--fixed-stride-split-duration 10.0 \--embedding-algorithm cosmos-embed1-224p \--transcode-encoder libopenh264' > my\_config.txt python tutorials/video/getting-started/video\_split\_clip\_example.py @my\_config.txt ### Configuration Options Reference | Option | Values | Description | | --------------------------------- | ---------------------------------------------------------------- | -------------------------------------------- | | **Splitting** | | | | `--splitting-algorithm` | `fixed_stride`, `transnetv2` | Method for dividing videos into clips | | `--fixed-stride-split-duration` | Float (seconds) | Clip length for fixed stride (default: 10.0) | | `--transnetv2-frame-decoder-mode` | `pynvc`, `ffmpeg_gpu`, `ffmpeg_cpu` | Frame decoding method for TransNetV2 | | **Embedding** | | | | `--embedding-algorithm` | `cosmos-embed1-224p`, `cosmos-embed1-336p`, `cosmos-embed1-448p` | Embedding model to use | | **Encoding** | | | | `--transcode-encoder` | `h264_nvenc`, `libopenh264`, `libx264` | Video encoder for output clips | | `--transcode-use-hwaccel` | Flag | Enable hardware acceleration for encoding | | **Optional Features** | | | | `--generate-captions` | Flag | Generate text captions for each clip | | `--generate-previews` | Flag | Create preview images for each clip | | `--verbose` | Flag | Enable detailed logging output | ### Understanding Pipeline Output After successful execution, the output directory will contain: ``` $OUT_DIR/ ├── clips/ │ ├── video1_clip_0000.mp4 │ ├── video1_clip_0001.mp4 │ └── ... ├── embeddings/ │ ├── video1_clip_0000.npy │ ├── video1_clip_0001.npy │ └── ... ├── metadata/ │ └── manifest.jsonl └── previews/ (if --generate-previews enabled) ├── video1_clip_0000.jpg └── ... ``` **File descriptions:** * **clips/**: Encoded video clips (MP4 format) * **embeddings/**: Numpy arrays containing clip embeddings (for similarity search) * **metadata/manifest.jsonl**: JSONL file with clip metadata (paths, timestamps, embeddings) * **previews/**: Thumbnail images for each clip (optional) **Example manifest entry:** ```json { "video_path": "/data/input_videos/video1.mp4", "clip_path": "/data/output_clips/clips/video1_clip_0000.mp4", "start_time": 0.0, "end_time": 10.0, "embedding_path": "/data/output_clips/embeddings/video1_clip_0000.npy", "preview_path": "/data/output_clips/previews/video1_clip_0000.jpg" } ``` ## Best Practices ### Data Preparation * **Validate input videos**: Ensure videos are not corrupted before processing * **Consistent formats**: Convert videos to a standard format (MP4 with H.264) for consistent results * **Organize by content**: Group similar videos together for efficient processing ### Model Selection * **Start with Cosmos-Embed1-224p**: Best balance of speed and quality for initial experiments * **Upgrade resolution as needed**: Use 336p or 448p only when higher precision is required * **Monitor VRAM usage**: Check GPU memory with `nvidia-smi` during processing ### Pipeline Configuration * **Enable verbose logging**: Use `--verbose` flag for debugging and monitoring * **Test on small subset**: Run pipeline on 5-10 videos before processing large datasets * **Use GPU encoding**: Enable NVENC for significant performance improvements * **Save intermediate results**: Keep embeddings and metadata for downstream tasks ### Infrastructure * **Use shared storage**: Mount shared filesystem for multi-node processing * **Allocate sufficient VRAM**: Plan for peak usage (captioning + embedding) * **Monitor GPU utilization**: Use `nvidia-smi dmon` to track GPU usage during processing * **Schedule long-running jobs**: Process large video datasets in batch jobs overnight ## Next Steps Explore the [Video Curation documentation](/curate-video). For encoding guidance, refer to [Clip Encoding](/curate-video/process-data/transcoding).