nemo_curator.models.transnetv2

Model for fast shot transition detection.

@article{soucek2020transnetv2, title={TransNet V2: An effective deep network architecture for fast shot transition detection}, author={Sou{\v{c}}ek, Tom{‘a}{\v{s}} and Loko{\v{c}}, Jakub}, year={2020}, journal={arXiv preprint arXiv:2008.04838}, }

Module Contents

Classes

Name	Description
`ColorHistograms`	Model for computing and comparing color histograms across video frames.
`Conv3DConfigurable`	Configurable 3D convolution layer with support for separable convolutions.
`DilatedDCNNV2`	Dilated dense convolutional model with multiple dilation rates.
`FrameSimilarity`	Model for computing frame similarity features in video sequences.
`StackedDDCNNV2`	Stacked dilated dense convolutional neural network for video feature extraction.
`TransNetV2`	Interface for TransNetV2 shot transition detection model.
`_TransNetV2`	-

Data

_TRANSNETV2_MODEL_ID

_TRANSNETV2_MODEL_REVISION

_TRANSNETV2_MODEL_WEIGHTS

API

class nemo_curator.models.transnetv2.ColorHistograms(
    lookup_window: int = 101,
    output_dim: int | None = None
)

Bases: Module

Model for computing and comparing color histograms across video frames.

nemo_curator.models.transnetv2.ColorHistograms.compute_color_histograms(
    frames: torch.Tensor
) -> torch.Tensor

staticmethod

Compute color histograms for video frames.

Parameters:

frames

torch.Tensor

Input tensor of video frames.

Returns: torch.Tensor

Color histogram tensor.

nemo_curator.models.transnetv2.ColorHistograms.forward(
    inputs: torch.Tensor
) -> torch.Tensor

Process input frames through the model.

Parameters:

inputs

torch.Tensor

Input tensor of video frames.

Returns: torch.Tensor

Model predictions for shot transitions.

class nemo_curator.models.transnetv2.Conv3DConfigurable(
    in_filters: int,
    filters: int,
    dilation_rate: int,
    separable: bool = True,
    use_bias: bool = True
)

Bases: Module

Configurable 3D convolution layer with support for separable convolutions.

layers

= nn.ModuleList([conv1, conv2])

nemo_curator.models.transnetv2.Conv3DConfigurable.forward(
    inputs: torch.Tensor
) -> torch.Tensor

Process input through the 3D convolutional layers.

Parameters:

inputs

torch.Tensor

Input tensor.

Returns: torch.Tensor

Processed tensor.

class nemo_curator.models.transnetv2.DilatedDCNNV2(
    in_filters: int,
    filters: int,
    batch_norm: bool = True,
    activation: collections.abc.Callable[[torch.Tensor], torch.Tensor] | None = None
)

Bases: Module

Dilated dense convolutional model with multiple dilation rates.

Conv3D_1

Conv3D_2

Conv3D_4

Conv3D_8

nemo_curator.models.transnetv2.DilatedDCNNV2.forward(
    inputs: torch.Tensor
) -> torch.Tensor

Process input through the dilated dense convolutional network.

Parameters:

inputs

torch.Tensor

Input tensor.

Returns: torch.Tensor

Processed tensor.

class nemo_curator.models.transnetv2.FrameSimilarity(
    in_filters: int,
    similarity_dim: int = 128,
    lookup_window: int = 101,
    output_dim: int = 128,
    use_bias: bool = False
)

Bases: Module

Model for computing frame similarity features in video sequences.

= nn.Linear(lookup_window, output_dim)

projection

nemo_curator.models.transnetv2.FrameSimilarity.forward(
    inputs: torch.Tensor
) -> torch.Tensor

Process input frames through the model.

Parameters:

inputs

torch.Tensor

Input tensor of video frames.

Returns: torch.Tensor

Frame similarity features.

class nemo_curator.models.transnetv2.StackedDDCNNV2(
    in_filters: int,
    n_blocks: int,
    filters: int,
    shortcut: bool = True,
    pool_type: str = 'avg',
    stochastic_depth_drop_prob: float = 0.0
)

Bases: Module

Stacked dilated dense convolutional neural network for video feature extraction.

DDCNN

pool

nemo_curator.models.transnetv2.StackedDDCNNV2.forward(
    inputs: torch.Tensor
) -> torch.Tensor

Process input through the stacked dilated dense convolutional network.

Parameters:

inputs

torch.Tensor

Input tensor.

Returns: torch.Tensor

Processed tensor.

class nemo_curator.models.transnetv2.TransNetV2(
    model_dir: str | None = None
)

Bases: ModelInterface

Interface for TransNetV2 shot transition detection model.

model_id_names

list[str]

Get the model ID names.

nemo_curator.models.transnetv2.TransNetV2.__call__(
    inputs: torch.Tensor
) -> torch.Tensor

TransNetV2 model call.

Parameters:

inputs

torch.Tensor

tensor of shape [# batch, # frames, height, width, RGB].

Returns: torch.Tensor

tensor of shape [# batch, # frames, 1] of probabilities for each frame being a shot transition.

nemo_curator.models.transnetv2.TransNetV2.download_weights_on_node(
    model_dir: str
) -> None

classmethod

Download TransNetV2 weights on the node.

Parameters:

model_dir

str

Directory to save the model weights. If None, uses self.model_dir.

nemo_curator.models.transnetv2.TransNetV2.setup() -> None

Set up the TransNetV2 model interface.

class nemo_curator.models.transnetv2._TransNetV2(
    rf: int = 16,
    rl: int = 3,
    rs: int = 2,
    rd: int = 1024,
    use_many_hot_targets: bool = True,
    use_frame_similarity: bool = True,
    use_color_histograms: bool = True,
    use_mean_pooling: bool = False,
    dropout_rate: float = 0.5
)

Bases: Module

SDDCNN

cls_layer1

= nn.Linear(rd, 1)

cls_layer2

= nn.Linear(rd, 1) if use_many_hot_targets else None

color_hist_layer

dropout

fc1

= nn.Linear(output_dim, rd)

frame_sim_layer

nemo_curator.models.transnetv2._TransNetV2.forward(
    inputs: torch.Tensor
) -> torch.Tensor

Process input through the TransNetV2 model.

Parameters:

inputs

torch.Tensor

Input tensor of video frames.

Returns: torch.Tensor

Model predictions for shot transitions.

nemo_curator.models.transnetv2._TRANSNETV2_MODEL_ID: Final = 'Sn4kehead/TransNetV2'

nemo_curator.models.transnetv2._TRANSNETV2_MODEL_REVISION: Final = 'db6ceab'

nemo_curator.models.transnetv2._TRANSNETV2_MODEL_WEIGHTS: Final = 'transnetv2-pytorch-weights.pth'