> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

> Process video data by splitting into clips, encoding, generating embeddings and captions, and removing duplicates

# Process Data

Use NeMo Curator stages to split videos into clips, encode them, generate embeddings or captions, and remove duplicates.

## How it Works

Create a `Pipeline` and add stages for clip extraction, optional re-encoding and filtering, embeddings or captions, previews, and writing outputs. Each stage is modular and configurable to match your quality and performance needs.

## Processing Options

Choose from the following stages to split, encode, filter, embed, caption, preview, and remove duplicates in your videos:

Split long videos into shorter clips using fixed stride or scene-change detection.
clips
fixed-stride
transnetv2

Encode clips to H.264 using CPU or GPU encoders and tune performance.
clips
h264\_nvenc

Apply motion-based filtering and aesthetic filtering to improve dataset quality.
clips
frames
motion
aesthetic

Extract frames from clips or full videos for embeddings, filtering, and analysis.
frames
frames
fps

Generate clip-level embeddings with Cosmos-Embed1 for search and duplicate removal.
clips
cosmos-embed1

Produce clip captions and optional preview images for review workflows.
clips
frames
captions
preview

Remove near-duplicates using semantic clustering and similarity with generated embeddings.
clips
semantic
pairwise

## Write Outputs

Persist clips, embeddings, previews, and metadata at the end of the pipeline using `ClipWriterStage`. Refer to [Save & Export](/curate-video/save-export) for directory layout and examples.

Example (place as the final stage):

```python
from nemo_curator.stages.video.io.clip_writer import ClipWriterStage

pipeline.add_stage(
    ClipWriterStage(
        output_path=OUT_DIR,
        input_path=VIDEO_DIR,
        upload_clips=True,
        dry_run=False,
        generate_embeddings=True,
        generate_previews=False,
        generate_captions=False,
        embedding_algorithm="cosmos-embed1-224p",
        caption_models=[],
        enhanced_caption_models=[],
        verbose=True,
    )
)
```

Path helpers are available to resolve common locations (such as `clips/`, `filtered_clips/`, `previews/`, `metas/v0/`, and `ce1_embd_parquet/`).