nemo_curator.stages.video.clipping.transnetv2_extraction
nemo_curator.stages.video.clipping.transnetv2_extraction
Module Contents
Classes
Functions
API
Bases: ProcessingStage[VideoTask, VideoTask]
Stage for extracting video clips using TransNetV2.
This class processes video clips through a series of steps including shot detection, scene filtering, and clip assignment.
Download TransNetV2 weights on the node.
Create spans between a start and an end point.
Parameters:
start point.
end point.
maximum length of span.
minimum length of span.
Returns: list[list[int]]
list of spans.
Crop scenes by removing frames from start and end.
Parameters:
integer 2D array like [[t0, t1], [t2, t3], …]
number of frames to crop from start and end of scene.
Returns: npt.NDArray[np.int32]
cropped scene array.
We fetch 100 frames, and pad the first and last batches accordingly with the first or last frame.
Filter scenes.
Parameters:
integer 2D array like [[t0, t1], [t2, t3], …]
optional minimum length of frames a scene can have.
optional maximum length of frames a scene can have.
how to deal with scenes that are above max length.
If truncate will truncate the length of each scene by max_length, if specified.
If stride, will generate a number of max_length scenes until the end of the scene.
If the end scene is less than min_length, it will drop the last scene.
optional number of frames to crop from start and end of scene. If cropped scenes result in zero-length scenes, these will be filtered.
Returns: npt.NDArray[np.int32]
filtered scene array.
Get predictions from the video frame array.
Parameters:
shot detection model.
uint8 array of shape (# frames, height, width, 3), with RGB channels.
probability threshold for shot detection.
Returns: npt.NDArray[np.uint8]
0/1 prediction array of shape (# frames, 1)
Convert prediction array to scene array.
Parameters:
array of shape [# frames, 1]. Values are 1 if frame is a shot transition, and 0 if it’s not.
If there are no shot transitions found, this will make a scene spanning the whole video.
Returns: npt.NDArray[np.int32]
scene array of shape [# scenes, 2], where the value at each row is the start and end frame of the shot.