nemo_curator.models.transnetv2

View as Markdown

Model for fast shot transition detection.

@article{soucek2020transnetv2, title={TransNet V2: An effective deep network architecture for fast shot transition detection}, author={Sou{\v{c}}ek, Tom{‘a}{\v{s}} and Loko{\v{c}}, Jakub}, year={2020}, journal={arXiv preprint arXiv:2008.04838}, }

Module Contents

Classes

NameDescription
ColorHistogramsModel for computing and comparing color histograms across video frames.
Conv3DConfigurableConfigurable 3D convolution layer with support for separable convolutions.
DilatedDCNNV2Dilated dense convolutional model with multiple dilation rates.
FrameSimilarityModel for computing frame similarity features in video sequences.
StackedDDCNNV2Stacked dilated dense convolutional neural network for video feature extraction.
TransNetV2Interface for TransNetV2 shot transition detection model.
_TransNetV2-

Data

_TRANSNETV2_MODEL_ID

_TRANSNETV2_MODEL_REVISION

_TRANSNETV2_MODEL_WEIGHTS

API

class nemo_curator.models.transnetv2.ColorHistograms(
lookup_window: int = 101,
output_dim: int | None = None
)

Bases: Module

Model for computing and comparing color histograms across video frames.

fc
nemo_curator.models.transnetv2.ColorHistograms.compute_color_histograms(
frames: torch.Tensor
) -> torch.Tensor
staticmethod

Compute color histograms for video frames.

Parameters:

frames
torch.Tensor

Input tensor of video frames.

Returns: torch.Tensor

Color histogram tensor.

nemo_curator.models.transnetv2.ColorHistograms.forward(
inputs: torch.Tensor
) -> torch.Tensor

Process input frames through the model.

Parameters:

inputs
torch.Tensor

Input tensor of video frames.

Returns: torch.Tensor

Model predictions for shot transitions.

class nemo_curator.models.transnetv2.Conv3DConfigurable(
in_filters: int,
filters: int,
dilation_rate: int,
separable: bool = True,
use_bias: bool = True
)

Bases: Module

Configurable 3D convolution layer with support for separable convolutions.

layers
= nn.ModuleList([conv1, conv2])
nemo_curator.models.transnetv2.Conv3DConfigurable.forward(
inputs: torch.Tensor
) -> torch.Tensor

Process input through the 3D convolutional layers.

Parameters:

inputs
torch.Tensor

Input tensor.

Returns: torch.Tensor

Processed tensor.

class nemo_curator.models.transnetv2.DilatedDCNNV2(
in_filters: int,
filters: int,
batch_norm: bool = True,
activation: collections.abc.Callable[[torch.Tensor], torch.Tensor] | None = None
)

Bases: Module

Dilated dense convolutional model with multiple dilation rates.

Conv3D_1
Conv3D_2
Conv3D_4
Conv3D_8
bn
nemo_curator.models.transnetv2.DilatedDCNNV2.forward(
inputs: torch.Tensor
) -> torch.Tensor

Process input through the dilated dense convolutional network.

Parameters:

inputs
torch.Tensor

Input tensor.

Returns: torch.Tensor

Processed tensor.

class nemo_curator.models.transnetv2.FrameSimilarity(
in_filters: int,
similarity_dim: int = 128,
lookup_window: int = 101,
output_dim: int = 128,
use_bias: bool = False
)

Bases: Module

Model for computing frame similarity features in video sequences.

fc
= nn.Linear(lookup_window, output_dim)
projection
nemo_curator.models.transnetv2.FrameSimilarity.forward(
inputs: torch.Tensor
) -> torch.Tensor

Process input frames through the model.

Parameters:

inputs
torch.Tensor

Input tensor of video frames.

Returns: torch.Tensor

Frame similarity features.

class nemo_curator.models.transnetv2.StackedDDCNNV2(
in_filters: int,
n_blocks: int,
filters: int,
shortcut: bool = True,
pool_type: str = 'avg',
stochastic_depth_drop_prob: float = 0.0
)

Bases: Module

Stacked dilated dense convolutional neural network for video feature extraction.

DDCNN
pool
nemo_curator.models.transnetv2.StackedDDCNNV2.forward(
inputs: torch.Tensor
) -> torch.Tensor

Process input through the stacked dilated dense convolutional network.

Parameters:

inputs
torch.Tensor

Input tensor.

Returns: torch.Tensor

Processed tensor.

class nemo_curator.models.transnetv2.TransNetV2(
model_dir: str | None = None
)

Bases: ModelInterface

Interface for TransNetV2 shot transition detection model.

model_id_names
list[str]

Get the model ID names.

nemo_curator.models.transnetv2.TransNetV2.__call__(
inputs: torch.Tensor
) -> torch.Tensor

TransNetV2 model call.

Parameters:

inputs
torch.Tensor

tensor of shape [# batch, # frames, height, width, RGB].

Returns: torch.Tensor

tensor of shape [# batch, # frames, 1] of probabilities for each frame being a shot transition.

nemo_curator.models.transnetv2.TransNetV2.download_weights_on_node(
model_dir: str
) -> None
classmethod

Download TransNetV2 weights on the node.

Parameters:

model_dir
str

Directory to save the model weights. If None, uses self.model_dir.

nemo_curator.models.transnetv2.TransNetV2.setup() -> None

Set up the TransNetV2 model interface.

class nemo_curator.models.transnetv2._TransNetV2(
rf: int = 16,
rl: int = 3,
rs: int = 2,
rd: int = 1024,
use_many_hot_targets: bool = True,
use_frame_similarity: bool = True,
use_color_histograms: bool = True,
use_mean_pooling: bool = False,
dropout_rate: float = 0.5
)

Bases: Module

SDDCNN
cls_layer1
= nn.Linear(rd, 1)
cls_layer2
= nn.Linear(rd, 1) if use_many_hot_targets else None
color_hist_layer
dropout
fc1
= nn.Linear(output_dim, rd)
frame_sim_layer
nemo_curator.models.transnetv2._TransNetV2.forward(
inputs: torch.Tensor
) -> torch.Tensor

Process input through the TransNetV2 model.

Parameters:

inputs
torch.Tensor

Input tensor of video frames.

Returns: torch.Tensor

Model predictions for shot transitions.

nemo_curator.models.transnetv2._TRANSNETV2_MODEL_ID: Final = 'Sn4kehead/TransNetV2'
nemo_curator.models.transnetv2._TRANSNETV2_MODEL_REVISION: Final = 'db6ceab'
nemo_curator.models.transnetv2._TRANSNETV2_MODEL_WEIGHTS: Final = 'transnetv2-pytorch-weights.pth'