Process Data#

Use NeMo Curator stages to split videos into clips, encode them, generate embeddings or captions, and remove duplicates.

How it Works#

Create a Pipeline and add stages for clip extraction, optional re-encoding and filtering, embeddings or captions, previews, and writing outputs. Each stage is modular and configurable to match your quality and performance needs.

Processing Options#

Choose from the following stages to split, encode, filter, embed, caption, preview, and remove duplicates in your videos:

Clip Videos

Split long videos into shorter clips using fixed stride or scene-change detection.

clips fixed-stride transnetv2

Video Clipping

Encode Clips

Encode clips to H.264 using CPU or GPU encoders and tune performance.

clips h264_nvenc libopenh264

Clip Encoding

Filter Clips and Frames

Apply motion-based filtering and aesthetic filtering to improve dataset quality.

clips frames motion aesthetic

Filtering

Extract Frames

Extract frames from clips or full videos for embeddings, filtering, and analysis.

frames frames fps

Frame Extraction

Create Embeddings

Generate clip-level embeddings with InternVideo2 or Cosmos-Embed1 for search and duplicate removal.

clips internvideo2 cosmos-embed1

Embeddings

Create Captions & Preview

Produce clip captions and optional preview images for review workflows.

clips frames captions preview

Captions and Preview

Remove Duplicate Embeddings

Remove near-duplicates using semantic clustering and similarity with generated embeddings.

clips semantic pairwise

Duplicate Removal

Write Outputs#

Persist clips, embeddings, previews, and metadata at the end of the pipeline using ClipWriterStage. Refer to Save & Export for directory layout and examples.

Example (place as the final stage):

from nemo_curator.stages.video.io.clip_writer import ClipWriterStage

pipeline.add_stage(
    ClipWriterStage(
        output_path=OUT_DIR,
        input_path=VIDEO_DIR,
        upload_clips=True,
        dry_run=False,
        generate_embeddings=True,
        generate_previews=False,
        generate_captions=False,
        embedding_algorithm="internvideo2",
        caption_models=[],
        enhanced_caption_models=[],
        verbose=True,
    )
)

Path helpers are available to resolve common locations (such as clips/, filtered_clips/, previews/, metas/v0/, and iv2_embd_parquet/).