Process Data
Use NeMo Curator stages to split videos into clips, encode them, generate embeddings or captions, and remove duplicates.
How it Works
Create a Pipeline and add stages for clip extraction, optional re-encoding and filtering, embeddings or captions, previews, and writing outputs. Each stage is modular and configurable to match your quality and performance needs.
Processing Options
Choose from the following stages to split, encode, filter, embed, caption, preview, and remove duplicates in your videos:
Split long videos into shorter clips using fixed stride or scene-change detection. clips fixed-stride transnetv2
Encode clips to H.264 using CPU or GPU encoders and tune performance. clips h264_nvenc libopenh264
Apply motion-based filtering and aesthetic filtering to improve dataset quality. clips frames motion aesthetic
Extract frames from clips or full videos for embeddings, filtering, and analysis. frames frames fps
Generate clip-level embeddings with Cosmos-Embed1 for search and duplicate removal. clips cosmos-embed1
Produce clip captions and optional preview images for review workflows. clips frames captions preview
Remove near-duplicates using semantic clustering and similarity with generated embeddings. clips semantic pairwise
Write Outputs
Persist clips, embeddings, previews, and metadata at the end of the pipeline using ClipWriterStage. Refer to Save & Export for directory layout and examples.
Example (place as the final stage):
Path helpers are available to resolve common locations (such as clips/, filtered_clips/, previews/, metas/v0/, and ce1_embd_parquet/).