Process Data#
Use NeMo Curator stages to split videos into clips, encode them, generate embeddings or captions, and remove duplicates.
How it Works#
Create a Pipeline
and add stages for clip extraction, optional re-encoding and filtering, embeddings or captions, previews, and writing outputs. Each stage is modular and configurable to match your quality and performance needs.
Processing Options#
Choose from the following stages to split, encode, filter, embed, caption, preview, and remove duplicates in your videos:
Split long videos into shorter clips using fixed stride or scene-change detection.
Encode clips to H.264 using CPU or GPU encoders and tune performance.
Apply motion-based filtering and aesthetic filtering to improve dataset quality.
Extract frames from clips or full videos for embeddings, filtering, and analysis.
Generate clip-level embeddings with InternVideo2 or Cosmos-Embed1 for search and duplicate removal.
Produce clip captions and optional preview images for review workflows.
Remove near-duplicates using semantic clustering and similarity with generated embeddings.
Write Outputs#
Persist clips, embeddings, previews, and metadata at the end of the pipeline using ClipWriterStage
. Refer to Save & Export for directory layout and examples.
Example (place as the final stage):
from nemo_curator.stages.video.io.clip_writer import ClipWriterStage
pipeline.add_stage(
ClipWriterStage(
output_path=OUT_DIR,
input_path=VIDEO_DIR,
upload_clips=True,
dry_run=False,
generate_embeddings=True,
generate_previews=False,
generate_captions=False,
embedding_algorithm="internvideo2",
caption_models=[],
enhanced_caption_models=[],
verbose=True,
)
)
Path helpers are available to resolve common locations (such as clips/
, filtered_clips/
, previews/
, metas/v0/
, and iv2_embd_parquet/
).