nemo_curator.stages.video.clipping.transnetv2_extraction

View as Markdown

Module Contents

Classes

NameDescription
TransNetV2ClipExtractionStageStage for extracting video clips using TransNetV2.

Functions

NameDescription
_create_spansCreate spans between a start and an end point.
_crop_scenesCrop scenes by removing frames from start and end.
_get_batchesWe fetch 100 frames, and pad the first and last batches accordingly with the first or last frame.
_get_filtered_scenesFilter scenes.
_get_predictionsGet predictions from the video frame array.
_get_scenesConvert prediction array to scene array.

API

class nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage(
model_dir: str = None,
threshold: float = 0.4,
min_length_s: float | None = 2.0,
max_length_s: float | None = 10.0,
max_length_mode: typing.Literal['truncate', 'stride'] = 'stride',
crop_s: float | None = 0.5,
entire_scene_as_clip: bool = True,
gpu_memory_gb: int = 10,
limit_clips: int = -1,
verbose: bool = False,
name: str = 'transnetv2_clip_extraction'
)
Dataclass

Bases: ProcessingStage[VideoTask, VideoTask]

Stage for extracting video clips using TransNetV2.

This class processes video clips through a series of steps including shot detection, scene filtering, and clip assignment.

crop_s
float | None = 0.5
entire_scene_as_clip
bool = True
gpu_memory_gb
int = 10
limit_clips
int = -1
max_length_mode
Literal['truncate', 'stride'] = 'stride'
max_length_s
float | None = 10.0
min_length_s
float | None = 2.0
model_dir
str = None
name
str = 'transnetv2_clip_extraction'
threshold
float = 0.4
verbose
bool = False
nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.__post_init__() -> None
nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.process(
task: nemo_curator.tasks.video.VideoTask
) -> nemo_curator.tasks.video.VideoTask
nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.setup(
worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None
nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.setup_on_node(
node_info: nemo_curator.backends.base.NodeInfo,
worker_metadata: nemo_curator.backends.base.WorkerMetadata
) -> None

Download TransNetV2 weights on the node.

nemo_curator.stages.video.clipping.transnetv2_extraction._create_spans(
start: int,
end: int,
max_length: int,
min_length: int | None
) -> list[list[int]]

Create spans between a start and an end point.

Parameters:

start
int

start point.

end
int

end point.

max_length
int

maximum length of span.

min_length
int | None

minimum length of span.

Returns: list[list[int]]

list of spans.

nemo_curator.stages.video.clipping.transnetv2_extraction._crop_scenes(
scenes: numpy.typing.NDArray[numpy.int32],
crop_length: int
) -> numpy.typing.NDArray[numpy.int32]

Crop scenes by removing frames from start and end.

Parameters:

scenes
npt.NDArray[np.int32]

integer 2D array like [[t0, t1], [t2, t3], …]

crop_length
int

number of frames to crop from start and end of scene.

Returns: npt.NDArray[np.int32]

cropped scene array.

nemo_curator.stages.video.clipping.transnetv2_extraction._get_batches(
frames: numpy.typing.NDArray[numpy.uint8]
) -> collections.abc.Generator[numpy.typing.NDArray[numpy.uint8], None, None]

We fetch 100 frames, and pad the first and last batches accordingly with the first or last frame.

nemo_curator.stages.video.clipping.transnetv2_extraction._get_filtered_scenes(
scenes: numpy.typing.NDArray[numpy.int32],
min_length: int | None = None,
max_length: int | None = None,
max_length_mode: typing.Literal['truncate', 'stride'] = 'truncate',
crop_length: int | None = None
) -> numpy.typing.NDArray[numpy.int32]

Filter scenes.

Parameters:

scenes
npt.NDArray[np.int32]

integer 2D array like [[t0, t1], [t2, t3], …]

min_length
int | NoneDefaults to None

optional minimum length of frames a scene can have.

max_length
int | NoneDefaults to None

optional maximum length of frames a scene can have.

max_length_mode
Literal['truncate', 'stride']Defaults to 'truncate'

how to deal with scenes that are above max length. If truncate will truncate the length of each scene by max_length, if specified. If stride, will generate a number of max_length scenes until the end of the scene. If the end scene is less than min_length, it will drop the last scene.

crop_length
int | NoneDefaults to None

optional number of frames to crop from start and end of scene. If cropped scenes result in zero-length scenes, these will be filtered.

Returns: npt.NDArray[np.int32]

filtered scene array.

nemo_curator.stages.video.clipping.transnetv2_extraction._get_predictions(
model: collections.abc.Callable[[torch.Tensor], torch.Tensor],
frames: numpy.typing.NDArray[numpy.uint8],
threshold: float
) -> numpy.typing.NDArray[numpy.uint8]

Get predictions from the video frame array.

Parameters:

model
Callable[[torch.Tensor], torch.Tensor]

shot detection model.

frames
npt.NDArray[np.uint8]

uint8 array of shape (# frames, height, width, 3), with RGB channels.

threshold
float

probability threshold for shot detection.

Returns: npt.NDArray[np.uint8]

0/1 prediction array of shape (# frames, 1)

nemo_curator.stages.video.clipping.transnetv2_extraction._get_scenes(
predictions: numpy.typing.NDArray[numpy.uint8],
entire_scene_as_clip: bool
) -> numpy.typing.NDArray[numpy.int32]

Convert prediction array to scene array.

Parameters:

predictions
npt.NDArray[np.uint8]

array of shape [# frames, 1]. Values are 1 if frame is a shot transition, and 0 if it’s not.

entire_scene_as_clip
bool

If there are no shot transitions found, this will make a scene spanning the whole video.

Returns: npt.NDArray[np.int32]

scene array of shape [# scenes, 2], where the value at each row is the start and end frame of the shot.