***

description: >-
Step-by-step guide to installing Curator and running your first video curation
pipeline
categories:

* getting-started
  tags:
* video-curation
* installation
* quickstart
* gpu-accelerated
* ray
* python
  personas:
* data-scientist-focused
* mle-focused
  difficulty: beginner
  content\_type: tutorial
  modality: video-only

***

# Get Started with Video Curation

This guide shows how to install Curator and run your first video curation pipeline.

The [example pipeline](#run-the-splitting-pipeline-example) processes a list of videos, splitting each into 10‑second clips using a fixed stride. It then generates clip‑level embeddings for downstream tasks such as duplicate removal and similarity search.

## Overview

This quickstart guide demonstrates how to:

1. **Install NeMo Curator** with video processing support
2. **Set up FFmpeg** with GPU-accelerated encoding
3. **Configure embedding models** (Cosmos-Embed1)
4. **Process videos** through a complete splitting and embedding pipeline
5. **Generate outputs** ready for duplicate removal, captioning, and model training

**What you build:** A video processing pipeline that:

* Splits videos into 10-second clips using fixed stride or scene detection
* Generates clip-level embeddings for similarity search and deduplication
* Optionally creates captions and preview images
* Outputs results in formats compatible with multimodal training workflows

## Prerequisites

### System Requirements

To use NeMo Curator's video curation capabilities, ensure your system meets these requirements:

#### Operating System

* **Ubuntu 24.04, 22.04, or 20.04** (required for GPU-accelerated video processing)
* Other Linux distributions may work but are not officially supported

#### Python Environment

* **Python 3.10, 3.11, or 3.12**
* **uv package manager** for dependency management
* **Git** for model and repository dependencies

#### GPU Requirements

* **NVIDIA GPU required** (CPU-only mode not supported for video processing)
* **Architecture**: Volta™ or newer (compute capability 7.0+)
  * Examples: V100, T4, RTX 2080+, A100, H100
* **CUDA**: Version 12.0 or above
* **VRAM**: Minimum requirements by configuration:
  * Basic splitting + embedding: \~16GB VRAM
  * Full pipeline (splitting + embedding + captioning): \~38GB VRAM
  * Reduced configuration (lower batch sizes, FP8): \~21GB VRAM

#### Software Dependencies

* **FFmpeg 8.0+** with H.264 encoding support
  * GPU encoder: `h264_nvenc` (recommended for performance)
  * CPU encoders: `libopenh264` or `libx264` (fallback options)

<Tip>
  If `uv` is not installed, refer to the [Installation Guide](/admin/installation) for setup instructions, or install it quickly with:

  ```bash
  curl -LsSf https://astral.sh/uv/0.8.22/install.sh | sh
  source $HOME/.local/bin/env
  ```
</Tip>

***

## Install

Create and activate a virtual environment, then choose an install option:

<Tabs>
  <Tab title="PyPI">
    ```bash
    uv pip install torch wheel_stub psutil setuptools setuptools_scm
    uv pip install --no-build-isolation "nemo-curator[video_cuda12]"
    ```
  </Tab>

  <Tab title="Source">
    ```bash
    git clone https://github.com/NVIDIA-NeMo/Curator.git
    cd Curator
    uv sync --extra video_cuda12 --all-groups
    source .venv/bin/activate
    ```
  </Tab>

  <Tab title="NeMo Curator Container">
    NeMo Curator is available as a standalone container:

    ```bash
    # Pull the container
    docker pull nvcr.io/nvidia/nemo-curator:{{ container_version }}

    # Run the container
    docker run --gpus all -it --rm nvcr.io/nvidia/nemo-curator:{{ container_version }}
    ```

    <Note>
      For details on container environments and configurations, see [Container Environments](/reference/infra/container-environments).
    </Note>
  </Tab>
</Tabs>

## Install FFmpeg and Encoders

Curator’s video pipelines rely on `FFmpeg` for decoding and encoding. If you plan to encode clips (for example, using `--transcode-encoder libopenh264` or `h264_nvenc`), install `FFmpeg` with the corresponding encoders.

<Tabs>
  <Tab title="Debian/Ubuntu (Script)">
    Use the maintained script in the repository to build and install `FFmpeg` with `libopenh264` and NVIDIA NVENC support. The script enables `--enable-libopenh264`, `--enable-cuda-nvcc`, and `--enable-libnpp`.

    * Script source: [docker/common/install\_ffmpeg.sh](https://github.com/NVIDIA-NeMo/Curator/blob/main/docker/common/install_ffmpeg.sh)

    ```bash
    curl -fsSL https://raw.githubusercontent.com/NVIDIA-NeMo/Curator/main/docker/common/install_ffmpeg.sh -o install_ffmpeg.sh
    chmod +x install_ffmpeg.sh
    sudo bash install_ffmpeg.sh
    ```
  </Tab>

  <Tab title="Verify Installation">
    Confirm that `FFmpeg` is on your `PATH` and that at least one H.264 encoder is available:

    ```bash
    ffmpeg -hide_banner -version | head -n 5
    ffmpeg -encoders | grep -E "h264_nvenc|libopenh264|libx264" | cat
    ```

    If encoders are missing, reinstall `FFmpeg` with the required options or use the Debian/Ubuntu script above.
  </Tab>
</Tabs>

Refer to [Clip Encoding](/curate-video/process-data/transcoding) to choose encoders and verify NVENC support on your system.

### Available Models

Embeddings convert each video clip into a numeric vector that captures visual and semantic content. Curator uses these vectors to:

* Remove near-duplicate clips during duplicate removal
* Enable similarity search and clustering
* Support downstream analysis such as caption verification

NeMo Curator supports two embedding model families:

#### Cosmos-Embed1 (Recommended)

**Cosmos-Embed1 (default)**: Available in three variants—**cosmos-embed1-224p**, **cosmos-embed1-336p**, and **cosmos-embed1-448p**—which differ in input resolution and accuracy/VRAM tradeoff. All variants are automatically downloaded to `MODEL_DIR` on first run.

| Model Variant          | Resolution | VRAM Usage | Speed   | Accuracy | Best For                                       |
| ---------------------- | ---------- | ---------- | ------- | -------- | ---------------------------------------------- |
| **cosmos-embed1-224p** | 224×224    | \~8GB      | Fastest | Good     | Large-scale processing, initial curation       |
| **cosmos-embed1-336p** | 336×336    | \~12GB     | Medium  | Better   | Balanced performance and quality               |
| **cosmos-embed1-448p** | 448×448    | \~16GB     | Slower  | Best     | High-quality embeddings, fine-grained matching |

**Model links:**

* [cosmos-embed1-224p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-224p)
* [cosmos-embed1-336p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-336p)
* [cosmos-embed1-448p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-448p)

For this quickstart, the following steps set up support for **Cosmos-Embed1-224p**.

### Prepare Model Weights

For most use cases, you only need to create a model directory. The required model files will be downloaded automatically on first run.

1. Create a model directory:
   ```bash
   mkdir -p "$MODEL_DIR"
   ```
   <Tip>
     You can reuse the same `<MODEL_DIR>` across runs.
   </Tip>

2. No additional setup is required. The model will be downloaded automatically when first used.

## Set Up Data Directories

Organize input videos and output locations before running the pipeline.

* **Local**: For local file processing. Define paths like:

  ```bash
  DATA_DIR=/path/to/videos
  OUT_DIR=/path/to/output_clips
  MODEL_DIR=/path/to/models
  ```

* **S3**: For cloud storage (AWS S3, MinIO, etc.). Configure credentials in `~/.aws/credentials` and use `s3://` paths for `--video-dir` and `--output-clip-path`.

**S3 usage notes:**

* Input videos can be read from S3 paths
* Output clips can be written to S3 paths
* Model directory should remain local for performance
* Ensure IAM permissions allow read/write access to specified buckets

## Run the Splitting Pipeline Example

Use the example script from [https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/video/getting-started](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/video/getting-started) to read videos, split into clips, and write outputs. This runs a Ray pipeline with `XennaExecutor` under the hood.

```bash
python tutorials/video/getting-started/video_split_clip_example.py \
  --video-dir "$DATA_DIR" \
  --model-dir "$MODEL_DIR" \
  --output-clip-path "$OUT_DIR" \
  --splitting-algorithm fixed_stride \
  --fixed-stride-split-duration 10.0 \
  --embedding-algorithm cosmos-embed1-224p \
  --transcode-encoder libopenh264 \
  --verbose
```

**What this command does:**

1. Reads all video files from `$DATA_DIR`
2. Splits each video into 10-second clips using fixed stride
3. Generates embeddings using Cosmos-Embed1-224p model
4. Encodes clips using libopenh264 codec
5. Writes output clips and metadata to `$OUT_DIR`

<Tip>
  **Using a config file**: The example script accepts many command-line arguments. For complex configurations, you can store arguments in a file and pass them with the `@` prefix:

  echo '--video-dir /data/videos
  \--output-clip-path /data/output
  \--splitting-algorithm fixed\_stride
  \--fixed-stride-split-duration 10.0
  \--embedding-algorithm cosmos-embed1-224p
  \--transcode-encoder libopenh264' > my\_config.txt

  python tutorials/video/getting-started/video\_split\_clip\_example.py @my\_config.txt
</Tip>

### Configuration Options Reference

| Option                            | Values                                                           | Description                                  |
| --------------------------------- | ---------------------------------------------------------------- | -------------------------------------------- |
| **Splitting**                     |                                                                  |                                              |
| `--splitting-algorithm`           | `fixed_stride`, `transnetv2`                                     | Method for dividing videos into clips        |
| `--fixed-stride-split-duration`   | Float (seconds)                                                  | Clip length for fixed stride (default: 10.0) |
| `--transnetv2-frame-decoder-mode` | `pynvc`, `ffmpeg_gpu`, `ffmpeg_cpu`                              | Frame decoding method for TransNetV2         |
| **Embedding**                     |                                                                  |                                              |
| `--embedding-algorithm`           | `cosmos-embed1-224p`, `cosmos-embed1-336p`, `cosmos-embed1-448p` | Embedding model to use                       |
| **Encoding**                      |                                                                  |                                              |
| `--transcode-encoder`             | `h264_nvenc`, `libopenh264`, `libx264`                           | Video encoder for output clips               |
| `--transcode-use-hwaccel`         | Flag                                                             | Enable hardware acceleration for encoding    |
| **Optional Features**             |                                                                  |                                              |
| `--generate-captions`             | Flag                                                             | Generate text captions for each clip         |
| `--generate-previews`             | Flag                                                             | Create preview images for each clip          |
| `--verbose`                       | Flag                                                             | Enable detailed logging output               |

### Understanding Pipeline Output

After successful execution, the output directory will contain:

```
$OUT_DIR/
├── clips/
│   ├── video1_clip_0000.mp4
│   ├── video1_clip_0001.mp4
│   └── ...
├── embeddings/
│   ├── video1_clip_0000.npy
│   ├── video1_clip_0001.npy
│   └── ...
├── metadata/
│   └── manifest.jsonl
└── previews/  (if --generate-previews enabled)
    ├── video1_clip_0000.jpg
    └── ...
```

**File descriptions:**

* **clips/**: Encoded video clips (MP4 format)
* **embeddings/**: Numpy arrays containing clip embeddings (for similarity search)
* **metadata/manifest.jsonl**: JSONL file with clip metadata (paths, timestamps, embeddings)
* **previews/**: Thumbnail images for each clip (optional)

**Example manifest entry:**

```json
{
  "video_path": "/data/input_videos/video1.mp4",
  "clip_path": "/data/output_clips/clips/video1_clip_0000.mp4",
  "start_time": 0.0,
  "end_time": 10.0,
  "embedding_path": "/data/output_clips/embeddings/video1_clip_0000.npy",
  "preview_path": "/data/output_clips/previews/video1_clip_0000.jpg"
}
```

## Best Practices

### Data Preparation

* **Validate input videos**: Ensure videos are not corrupted before processing
* **Consistent formats**: Convert videos to a standard format (MP4 with H.264) for consistent results
* **Organize by content**: Group similar videos together for efficient processing

### Model Selection

* **Start with Cosmos-Embed1-224p**: Best balance of speed and quality for initial experiments
* **Upgrade resolution as needed**: Use 336p or 448p only when higher precision is required
* **Monitor VRAM usage**: Check GPU memory with `nvidia-smi` during processing

### Pipeline Configuration

* **Enable verbose logging**: Use `--verbose` flag for debugging and monitoring
* **Test on small subset**: Run pipeline on 5-10 videos before processing large datasets
* **Use GPU encoding**: Enable NVENC for significant performance improvements
* **Save intermediate results**: Keep embeddings and metadata for downstream tasks

### Infrastructure

* **Use shared storage**: Mount shared filesystem for multi-node processing
* **Allocate sufficient VRAM**: Plan for peak usage (captioning + embedding)
* **Monitor GPU utilization**: Use `nvidia-smi dmon` to track GPU usage during processing
* **Schedule long-running jobs**: Process large video datasets in batch jobs overnight

## Next Steps

Explore the [Video Curation documentation](/curate-video). For encoding guidance, refer to [Clip Encoding](/curate-video/process-data/transcoding).
