Curated Dataset Structure#

Once a Dataset has been curated, you can download a ZIP file of the curated dataset or view it in your S3 storage location. The contents of a curated dataset is described below.

File/Folder Name

Description

clips

Directory containing the curated clips

iv2_embd

Directory containing the embeddings for the clips

metas

Directory contains the captions for each clip. The clip and captions share the same uuid file in the file names.

previews

Directory containing the webp previews of the curated clips

processed_clip_chunks

Directory containing processed video file segment names

processed_videos

Directory containing processed video file names

summary.json

JSON file containing detailed metadata summary of videos processed, clips generated, and some statistics, along with video/clip metadata

v0

Directory containing the same captions as metas, but inside a single file for easy processing (i.e. v0/all_window_captions.json)