Clipping | NeMo Curator

Split long videos into shorter clips for downstream processing.

How it Works

NeMo Curator provides two clipping stages: Fixed Stride and TransNetV2 scene-change detection.

Use Fixed Stride to create uniform segments.
Use TransNetV2 to cut at visual shot boundaries.

Before You Start

Ensure inputs contain video bytes and basic metadata. The clipping stages require video.source_bytes to be present and metadata with framerate and num_frames.

Quickstart

Use either the pipeline stages or the example script flags to create clips.

Pipeline Stage

Script Flags

1 from nemo_curator.pipeline import Pipeline
2 from nemo_curator.stages.video.clipping.clip_extraction_stages import (
3     FixedStrideExtractorStage,
4 )
5 from nemo_curator.stages.video.clipping.video_frame_extraction import (
6     VideoFrameExtractionStage,
7 )
8 from nemo_curator.stages.video.clipping.transnetv2_extraction import (
9     TransNetV2ClipExtractionStage,
10 )
11 
12 pipe = Pipeline(name="clipping_examples")
13 
14 # Fixed Stride
15 pipe.add_stage(
16     FixedStrideExtractorStage(
17         clip_len_s=10.0,
18         clip_stride_s=10.0,
19         min_clip_length_s=2.0,
20         limit_clips=0,
21     )
22 )
23 
24 # TransNetV2 (requires full-video frame extraction first)
25 pipe.add_stage(VideoFrameExtractionStage(decoder_mode="pynvc", verbose=True))
26 pipe.add_stage(
27     TransNetV2ClipExtractionStage(
28         model_dir="/models",
29         threshold=0.4,
30         min_length_s=2.0,
31         max_length_s=10.0,
32         max_length_mode="stride",
33         crop_s=0.5,
34         gpu_memory_gb=10,
35         limit_clips=-1,
36         verbose=True,
37     )
38 )
39 
40 pipe.run()

Clipping Options

Fixed Stride

The FixedStrideExtractorStage steps through the video duration by clip_stride_s, creating spans of length clip_len_s (it truncates the final span at the video end when needed). It filters spans shorter than min_clip_length_s and appends Clip objects identified by source and frame indices.

1 from nemo_curator.stages.video.clipping.clip_extraction_stages import FixedStrideExtractorStage
2 
3 stage = FixedStrideExtractorStage(
4     clip_len_s=10.0,
5     clip_stride_s=10.0,
6     min_clip_length_s=2.0,
7     limit_clips=0,
8 )

If limit_clips is greater than 0 and the Video already has clips, the stage skips processing. It does not cap the number of clips generated within the same run.

TransNetV2 Scene-Change Detection

TransNetV2 is a shot-boundary detection model that identifies transitions between shots. The stage converts those transitions into scenes, applies length/crop rules, and emits clips aligned to scene boundaries.

Using extracted frames of size 27×48×3, the model predicts shot transitions, converts them into scenes, and applies filtering: min_length_s, max_length_s with max_length_mode (“truncate” or “stride”), and optional crop_s at both ends. It creates Clip objects for the resulting spans, then stops after it reaches limit_clips (greater than 0), and releases frames from memory after processing.

Run VideoFrameExtractionStage first to populate video.frame_array.

1 from nemo_curator.stages.video.clipping.video_frame_extraction import VideoFrameExtractionStage
2 from nemo_curator.stages.video.clipping.transnetv2_extraction import TransNetV2ClipExtractionStage
3 
4 frame_extractor = VideoFrameExtractionStage(
5     decoder_mode="pynvc",  # or "ffmpeg_gpu", "ffmpeg_cpu"
6     verbose=True,
7 )

Frames must be (27, 48, 3) per frame; the stage accepts arrays shaped (num_frames, 27, 48, 3) and transposes from (48, 27, 3) automatically.

Configure TransNetV2 and run the stage in your pipeline to generate clips from the detected scenes.

1 transnet = TransNetV2ClipExtractionStage(
2     model_dir="/models",
3     threshold=0.4,
4     min_length_s=2.0,
5     max_length_s=10.0,
6     max_length_mode="stride",  # or "truncate"
7     crop_s=0.5,
8     gpu_memory_gb=10,
9     limit_clips=-1,
10     verbose=True,
11 )