> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

> Step-by-step guide to installing Curator and running your first video curation pipeline

# Get Started with Video Curation

This guide shows how to install Curator and run your first video curation pipeline.

The [example pipeline](#run-the-splitting-pipeline-example) processes a list of videos, splitting each into 10‑second clips using a fixed stride. It then generates clip‑level embeddings for downstream tasks such as duplicate removal and similarity search.

## Overview

This quickstart guide demonstrates how to:

1. **Install NeMo Curator** with video processing support
2. **Set up FFmpeg** with GPU-accelerated encoding
3. **Configure embedding models** (Cosmos-Embed1)
4. **Process videos** through a complete splitting and embedding pipeline
5. **Generate outputs** ready for duplicate removal, captioning, and model training

**What you build:** A video processing pipeline that:

* Splits videos into 10-second clips using fixed stride or scene detection
* Generates clip-level embeddings for similarity search and deduplication
* Optionally creates captions and preview images
* Outputs results in formats compatible with multimodal training workflows

## Prerequisites

### System Requirements

To use NeMo Curator's video curation capabilities, ensure your system meets these requirements:

#### Operating System

* **Ubuntu 24.04, 22.04, or 20.04** (required for GPU-accelerated video processing)
* Other Linux distributions may work but are not officially supported

#### Python Environment

* **Python 3.10, 3.11, or 3.12**
* **uv package manager** for dependency management
* **Git** for model and repository dependencies

#### GPU Requirements

* **NVIDIA GPU required** (CPU-only mode not supported for video processing)
* **Architecture**: Volta™ or newer (compute capability 7.0+)
  * Examples: V100, T4, RTX 2080+, A100, H100
* **CUDA**: Version 12.0 or above
* **VRAM**: Minimum requirements by configuration:
  * Basic splitting + embedding: \~16GB VRAM
  * Full pipeline (splitting + embedding + captioning): \~38GB VRAM
  * Reduced configuration (lower batch sizes, FP8): \~21GB VRAM

#### Software Dependencies

* **FFmpeg 8.0+** with one of the following encoders:
  * GPU encoder: `h264_nvenc` (recommended for performance; requires an NVENC-equipped GPU — note that A100 and H100 do **not** include NVENC)
  * CPU encoder: `libvpx-vp9` (for non-NVENC GPUs; produces VP9 in `.mp4`)

If `uv` is not installed, refer to the [Installation Guide](/get-started/installation) for setup instructions, or install it quickly with:

```bash
curl -LsSf https://astral.sh/uv/0.8.22/install.sh | sh
source $HOME/.local/bin/env
```

***

## Install

Create and activate a virtual environment, then choose an install option:

```bash
uv pip install torch wheel_stub psutil setuptools setuptools_scm
uv pip install --no-build-isolation "nemo-curator[video_cuda12]"
```

```bash
git clone https://github.com/NVIDIA-NeMo/Curator.git
cd Curator
uv sync --extra video_cuda12 --all-groups
source .venv/bin/activate
```

NeMo Curator is available as a standalone container:

```bash
# Pull the container
docker pull nvcr.io/nvidia/nemo-curator:{{ container_version }}

# Run the container
docker run --gpus all -it --rm nvcr.io/nvidia/nemo-curator:{{ container_version }}
```

For details on container environments and configurations, see [Container Environments](/reference/infra/container-environments).

## Install FFmpeg and Encoders

Curator’s video pipelines rely on `FFmpeg` for decoding and encoding. If you plan to encode clips (using `--transcode-encoder h264_nvenc` or `--transcode-encoder libvpx-vp9`), install `FFmpeg` with NVENC and `libvpx-vp9` support. The maintained install script bundles both.

Use the maintained script in the repository to build and install `FFmpeg` with NVIDIA NVENC and `libvpx-vp9` support. The script enables `--enable-cuda-nvcc`, `--enable-libnpp`, and `--enable-libvpx`.

* Script source: [docker/common/install\_ffmpeg.sh](https://github.com/NVIDIA-NeMo/Curator/blob/main/docker/common/install_ffmpeg.sh)

```bash
curl -fsSL https://raw.githubusercontent.com/NVIDIA-NeMo/Curator/main/docker/common/install_ffmpeg.sh -o install_ffmpeg.sh
chmod +x install_ffmpeg.sh
sudo bash install_ffmpeg.sh
```

Confirm that `FFmpeg` is on your `PATH` and that at least one supported encoder is available:

```bash
ffmpeg -hide_banner -version | head -n 5
ffmpeg -encoders | grep -E "h264_nvenc|libvpx-vp9" | cat
```

If encoders are missing, reinstall `FFmpeg` with the required options or use the Debian/Ubuntu script above.

**Processing H.264/HEVC/AV1 inputs? You might still need a software decoder — even with NVENC/NVDEC.**

Curator runs `ffprobe` inside CPU-only Ray actors (`VideoReader`, `ClipWriter`) for metadata extraction. Those actors can't open NVDEC decoders, so without a software h264/hevc/av1 decoder your inputs are silently skipped (`SoftwareCodecMissingError` in the logs).

Run the bundled installer inside the container to add software decoder support — no image rebuild needed:

```bash
bash /opt/Curator/docker/common/install_h264_support.sh
```

See [Software H.264/HEVC/AV1 Codec Support](/get-started/installation#software-h264hevcav1-codec-support-advanced) for the full picture.

Refer to [Clip Encoding](/curate-video/process-data/transcoding) to choose encoders and verify NVENC support on your system.

### Available Models

Embeddings convert each video clip into a numeric vector that captures visual and semantic content. Curator uses these vectors to:

* Remove near-duplicate clips during duplicate removal
* Enable similarity search and clustering
* Support downstream analysis such as caption verification

NeMo Curator supports two embedding model families:

#### Cosmos-Embed1 (Recommended)

**Cosmos-Embed1 (default)**: Available in three variants—**cosmos-embed1-224p**, **cosmos-embed1-336p**, and **cosmos-embed1-448p**—which differ in input resolution and accuracy/VRAM tradeoff. All variants are automatically downloaded to `MODEL_DIR` on first run.

| Model Variant          | Resolution | VRAM Usage | Speed   | Accuracy | Best For                                       |
| ---------------------- | ---------- | ---------- | ------- | -------- | ---------------------------------------------- |
| **cosmos-embed1-224p** | 224×224    | \~8GB      | Fastest | Good     | Large-scale processing, initial curation       |
| **cosmos-embed1-336p** | 336×336    | \~12GB     | Medium  | Better   | Balanced performance and quality               |
| **cosmos-embed1-448p** | 448×448    | \~16GB     | Slower  | Best     | High-quality embeddings, fine-grained matching |

**Model links:**

* [cosmos-embed1-224p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-224p)
* [cosmos-embed1-336p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-336p)
* [cosmos-embed1-448p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-448p)

For this quickstart, the following steps set up support for **Cosmos-Embed1-224p**.

### Prepare Model Weights

For most use cases, you only need to create a model directory. The required model files will be downloaded automatically on first run.

1. Create a model directory:
   ```bash
   mkdir -p "$MODEL_DIR"
   ```
   You can reuse the same `<MODEL_DIR>` across runs.

2. No additional setup is required. The model will be downloaded automatically when first used.

## Set Up Data Directories

Organize input videos and output locations before running the pipeline.

* **Local**: For local file processing. Define paths like:

  ```bash
  DATA_DIR=/path/to/videos
  OUT_DIR=/path/to/output_clips
  MODEL_DIR=/path/to/models
  ```

* **S3**: For cloud storage (AWS S3, MinIO, etc.). Configure credentials in `~/.aws/credentials` and use `s3://` paths for `--video-dir` and `--output-clip-path`.

**S3 usage notes:**

* Input videos can be read from S3 paths
* Output clips can be written to S3 paths
* Model directory should remain local for performance
* Ensure IAM permissions allow read/write access to specified buckets

## Run the Splitting Pipeline Example

Use the example script from [https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/video/getting-started](https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/video/getting-started) to read videos, split into clips, and write outputs. This runs a Ray pipeline with `XennaExecutor` under the hood.

```bash
python tutorials/video/getting-started/video_split_clip_example.py \
  --video-dir "$DATA_DIR" \
  --model-dir "$MODEL_DIR" \
  --output-clip-path "$OUT_DIR" \
  --splitting-algorithm fixed_stride \
  --fixed-stride-split-duration 10.0 \
  --embedding-algorithm cosmos-embed1-224p \
  --transcode-encoder h264_nvenc \
  --verbose
```

**What this command does:**

1. Reads all video files from `$DATA_DIR`
2. Splits each video into 10-second clips using fixed stride
3. Generates embeddings using Cosmos-Embed1-224p model
4. Encodes clips using h264\_nvenc codec
5. Writes output clips and metadata to `$OUT_DIR`

**Using a config file**: The example script accepts many command-line arguments. For complex configurations, you can store arguments in a file and pass them with the `@` prefix:

echo '--video-dir /data/videos
\--output-clip-path /data/output
\--splitting-algorithm fixed\_stride
\--fixed-stride-split-duration 10.0
\--embedding-algorithm cosmos-embed1-224p
\--transcode-encoder h264\_nvenc' > my\_config.txt

python tutorials/video/getting-started/video\_split\_clip\_example.py @my\_config.txt

### Configuration Options Reference

| Option                            | Values                                                           | Description                                                                                                                                                                                                                                                                                                                                                      |
| --------------------------------- | ---------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Splitting**                     |                                                                  |                                                                                                                                                                                                                                                                                                                                                                  |
| `--splitting-algorithm`           | `fixed_stride`, `transnetv2`                                     | Method for dividing videos into clips                                                                                                                                                                                                                                                                                                                            |
| `--fixed-stride-split-duration`   | Float (seconds)                                                  | Clip length for fixed stride (default: 10.0)                                                                                                                                                                                                                                                                                                                     |
| `--transnetv2-frame-decoder-mode` | `pynvc`, `ffmpeg_gpu`, `ffmpeg_cpu`                              | Frame decoding method for TransNetV2                                                                                                                                                                                                                                                                                                                             |
| **Embedding**                     |                                                                  |                                                                                                                                                                                                                                                                                                                                                                  |
| `--embedding-algorithm`           | `cosmos-embed1-224p`, `cosmos-embed1-336p`, `cosmos-embed1-448p` | Embedding model to use                                                                                                                                                                                                                                                                                                                                           |
| **Encoding**                      |                                                                  |                                                                                                                                                                                                                                                                                                                                                                  |
| `--transcode-encoder`             | `h264_nvenc`, `libvpx-vp9`, `libopenh264`                        | Video encoder for output clips. Use `libvpx-vp9` (CPU) on GPUs without NVENC such as A100/H100. `libopenh264` is opt-in — run `install_h264_support.sh --with-libopenh264` inside the container or provide a system FFmpeg that includes it. See [Software H.264/HEVC/AV1 Codec Support](/get-started/installation#software-h264hevcav1-codec-support-advanced). |
| `--transcode-use-hwaccel`         | Flag                                                             | Enable hardware acceleration for encoding (only valid with `h264_nvenc`).                                                                                                                                                                                                                                                                                        |
| **Optional Features**             |                                                                  |                                                                                                                                                                                                                                                                                                                                                                  |
| `--generate-captions`             | Flag                                                             | Generate text captions for each clip                                                                                                                                                                                                                                                                                                                             |
| `--generate-previews`             | Flag                                                             | Create preview images for each clip                                                                                                                                                                                                                                                                                                                              |
| `--verbose`                       | Flag                                                             | Enable detailed logging output                                                                                                                                                                                                                                                                                                                                   |

### Understanding Pipeline Output

After successful execution, the output directory will contain:

```
$OUT_DIR/
├── clips/
│   ├── video1_clip_0000.mp4
│   ├── video1_clip_0001.mp4
│   └── ...
├── embeddings/
│   ├── video1_clip_0000.npy
│   ├── video1_clip_0001.npy
│   └── ...
├── metadata/
│   └── manifest.jsonl
└── previews/  (if --generate-previews enabled)
    ├── video1_clip_0000.jpg
    └── ...
```

**File descriptions:**

* **clips/**: Encoded video clips (MP4 format)
* **embeddings/**: Numpy arrays containing clip embeddings (for similarity search)
* **metadata/manifest.jsonl**: JSONL file with clip metadata (paths, timestamps, embeddings)
* **previews/**: Thumbnail images for each clip (optional)

**Example manifest entry:**

```json
{
  "video_path": "/data/input_videos/video1.mp4",
  "clip_path": "/data/output_clips/clips/video1_clip_0000.mp4",
  "start_time": 0.0,
  "end_time": 10.0,
  "embedding_path": "/data/output_clips/embeddings/video1_clip_0000.npy",
  "preview_path": "/data/output_clips/previews/video1_clip_0000.jpg"
}
```

## Best Practices

### Data Preparation

* **Validate input videos**: Ensure videos are not corrupted before processing
* **Consistent formats**: Convert videos to a standard format (MP4 with H.264) for consistent results
* **Organize by content**: Group similar videos together for efficient processing

### Model Selection

* **Start with Cosmos-Embed1-224p**: Best balance of speed and quality for initial experiments
* **Upgrade resolution as needed**: Use 336p or 448p only when higher precision is required
* **Monitor VRAM usage**: Check GPU memory with `nvidia-smi` during processing

### Pipeline Configuration

* **Enable verbose logging**: Use `--verbose` flag for debugging and monitoring
* **Test on small subset**: Run pipeline on 5-10 videos before processing large datasets
* **Use GPU encoding**: Enable NVENC for significant performance improvements
* **Save intermediate results**: Keep embeddings and metadata for downstream tasks

### Infrastructure

* **Use shared storage**: Mount shared filesystem for multi-node processing
* **Allocate sufficient VRAM**: Plan for peak usage (captioning + embedding)
* **Monitor GPU utilization**: Use `nvidia-smi dmon` to track GPU usage during processing
* **Schedule long-running jobs**: Process large video datasets in batch jobs overnight

## Next Steps

Explore the [Video Curation documentation](/curate-video). For encoding guidance, refer to [Clip Encoding](/curate-video/process-data/transcoding).