Video Clipping
Split long videos into shorter clips for downstream processing.
How it Works
NeMo Curator provides two clipping stages: Fixed Stride and TransNetV2 scene-change detection.
- Use Fixed Stride to create uniform segments.
- Use TransNetV2 to cut at visual shot boundaries.
Before You Start
Ensure inputs contain video bytes and basic metadata. The clipping stages require video.source_bytes to be present and metadata with framerate and num_frames.
Quickstart
Use either the pipeline stages or the example script flags to create clips.
Pipeline Stage
Script Flags
Clipping Options
Fixed Stride
The FixedStrideExtractorStage steps through the video duration by clip_stride_s, creating spans of length clip_len_s (it truncates the final span at the video end when needed). It filters spans shorter than min_clip_length_s and appends Clip objects identified by source and frame indices.
If limit_clips is greater than 0 and the Video already has clips, the stage skips processing. It does not cap the number of clips generated within the same run.
TransNetV2 Scene-Change Detection
TransNetV2 is a shot-boundary detection model that identifies transitions between shots. The stage converts those transitions into scenes, applies length/crop rules, and emits clips aligned to scene boundaries.
Using extracted frames of size 27×48×3, the model predicts shot transitions, converts them into scenes, and applies filtering: min_length_s, max_length_s with max_length_mode (“truncate” or “stride”), and optional crop_s at both ends. It creates Clip objects for the resulting spans, then stops after it reaches limit_clips (greater than 0), and releases frames from memory after processing.
-
Run
VideoFrameExtractionStagefirst to populatevideo.frame_array.Frames must be
(27, 48, 3)per frame; the stage accepts arrays shaped(num_frames, 27, 48, 3)and transposes from(48, 27, 3)automatically. -
Configure TransNetV2 and run the stage in your pipeline to generate clips from the detected scenes.