Save and Export
NeMo Curator writes clips, metadata, previews, and embeddings to a structured output directory. Use this guide to add the writer to your pipeline, understand the directories it creates, and prepare artifacts for training.
Writer Stage
Use ClipWriterStage as the final stage in your pipeline.
Parameters
Output Directories
The writer produces these directories under output_path:
clips/: Encoded clip media (.mp4).filtered_clips/: Media for filtered-out clips.previews/: Preview images (.webp).metas/v0/: Per-clip metadata (.json).ce1_embd/: Per-clip embeddings (.pickle).ce1_embd_parquet/: Parquet batches with columnsidandembedding.processed_videos/,processed_clip_chunks/: Video-level metadata and per-chunk statistics.
Per-Clip Metadata
Each clip writes a JSON file under metas/v0/ with clip- and window-level fields:
- Caption keys follow
<model>_captionand<model>_enhanced_caption, based oncaption_modelsandenhanced_caption_models. - With
dry_run=True, per-clip metadata is not written. Video- and chunk-level metadata are still written. - The stage writes video-level metadata and per-chunk stats to
processed_videos/andprocessed_clip_chunks/.
Embeddings and Parquet outputs
- When embeddings exist, the stage writes per-clip
.picklefiles underce1_embd/. - The stage also batches embeddings per clip chunk into Parquet files under
ce1_embd_parquet/with columnsidandembeddingand writes those files to disk.
Helpers
Resolve Paths Programmatically
Use helpers to construct paths consistently: