*** layout: overview slug: nemo-curator/nemo\_curator/models/transnetv2 title: nemo\_curator.models.transnetv2 -------------------------------------- Model for fast shot transition detection. @article\{soucek2020transnetv2, title=\{TransNet V2: An effective deep network architecture for fast shot transition detection}, author=\{Sou\{\v\{c}}ek, Tom\{'a}\{\v\{s}} and Loko\{\v\{c}}, Jakub}, year=\{2020}, journal=\{arXiv preprint arXiv:2008.04838}, } ## Module Contents ### Classes | Name | Description | | -------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | | [`ColorHistograms`](#nemo_curator-models-transnetv2-ColorHistograms) | Model for computing and comparing color histograms across video frames. | | [`Conv3DConfigurable`](#nemo_curator-models-transnetv2-Conv3DConfigurable) | Configurable 3D convolution layer with support for separable convolutions. | | [`DilatedDCNNV2`](#nemo_curator-models-transnetv2-DilatedDCNNV2) | Dilated dense convolutional model with multiple dilation rates. | | [`FrameSimilarity`](#nemo_curator-models-transnetv2-FrameSimilarity) | Model for computing frame similarity features in video sequences. | | [`StackedDDCNNV2`](#nemo_curator-models-transnetv2-StackedDDCNNV2) | Stacked dilated dense convolutional neural network for video feature extraction. | | [`TransNetV2`](#nemo_curator-models-transnetv2-TransNetV2) | Interface for TransNetV2 shot transition detection model. | | [`_TransNetV2`](#nemo_curator-models-transnetv2-_TransNetV2) | - | ### Data [`_TRANSNETV2_MODEL_ID`](#nemo_curator-models-transnetv2-_TRANSNETV2_MODEL_ID) [`_TRANSNETV2_MODEL_REVISION`](#nemo_curator-models-transnetv2-_TRANSNETV2_MODEL_REVISION) [`_TRANSNETV2_MODEL_WEIGHTS`](#nemo_curator-models-transnetv2-_TRANSNETV2_MODEL_WEIGHTS) ### API ```python class nemo_curator.models.transnetv2.ColorHistograms( lookup_window: int = 101, output_dim: int | None = None ) ``` **Bases:** `Module` Model for computing and comparing color histograms across video frames. ```python nemo_curator.models.transnetv2.ColorHistograms.compute_color_histograms( frames: torch.Tensor ) -> torch.Tensor ``` staticmethod Compute color histograms for video frames. **Parameters:** Input tensor of video frames. **Returns:** `torch.Tensor` Color histogram tensor. ```python nemo_curator.models.transnetv2.ColorHistograms.forward( inputs: torch.Tensor ) -> torch.Tensor ``` Process input frames through the model. **Parameters:** Input tensor of video frames. **Returns:** `torch.Tensor` Model predictions for shot transitions. ```python class nemo_curator.models.transnetv2.Conv3DConfigurable( in_filters: int, filters: int, dilation_rate: int, separable: bool = True, use_bias: bool = True ) ``` **Bases:** `Module` Configurable 3D convolution layer with support for separable convolutions. ```python nemo_curator.models.transnetv2.Conv3DConfigurable.forward( inputs: torch.Tensor ) -> torch.Tensor ``` Process input through the 3D convolutional layers. **Parameters:** Input tensor. **Returns:** `torch.Tensor` Processed tensor. ```python class nemo_curator.models.transnetv2.DilatedDCNNV2( in_filters: int, filters: int, batch_norm: bool = True, activation: collections.abc.Callable[[torch.Tensor], torch.Tensor] | None = None ) ``` **Bases:** `Module` Dilated dense convolutional model with multiple dilation rates. ```python nemo_curator.models.transnetv2.DilatedDCNNV2.forward( inputs: torch.Tensor ) -> torch.Tensor ``` Process input through the dilated dense convolutional network. **Parameters:** Input tensor. **Returns:** `torch.Tensor` Processed tensor. ```python class nemo_curator.models.transnetv2.FrameSimilarity( in_filters: int, similarity_dim: int = 128, lookup_window: int = 101, output_dim: int = 128, use_bias: bool = False ) ``` **Bases:** `Module` Model for computing frame similarity features in video sequences. ```python nemo_curator.models.transnetv2.FrameSimilarity.forward( inputs: torch.Tensor ) -> torch.Tensor ``` Process input frames through the model. **Parameters:** Input tensor of video frames. **Returns:** `torch.Tensor` Frame similarity features. ```python class nemo_curator.models.transnetv2.StackedDDCNNV2( in_filters: int, n_blocks: int, filters: int, shortcut: bool = True, pool_type: str = 'avg', stochastic_depth_drop_prob: float = 0.0 ) ``` **Bases:** `Module` Stacked dilated dense convolutional neural network for video feature extraction. ```python nemo_curator.models.transnetv2.StackedDDCNNV2.forward( inputs: torch.Tensor ) -> torch.Tensor ``` Process input through the stacked dilated dense convolutional network. **Parameters:** Input tensor. **Returns:** `torch.Tensor` Processed tensor. ```python class nemo_curator.models.transnetv2.TransNetV2( model_dir: str | None = None ) ``` **Bases:** [ModelInterface](/nemo-curator/nemo_curator/models/base#nemo_curator-models-base-ModelInterface) Interface for TransNetV2 shot transition detection model. Get the model ID names. ```python nemo_curator.models.transnetv2.TransNetV2.__call__( inputs: torch.Tensor ) -> torch.Tensor ``` TransNetV2 model call. **Parameters:** tensor of shape \[# batch, # frames, height, width, RGB]. **Returns:** `torch.Tensor` tensor of shape \[# batch, # frames, 1] of probabilities for each frame being a shot transition. ```python nemo_curator.models.transnetv2.TransNetV2.download_weights_on_node( model_dir: str ) -> None ``` classmethod Download TransNetV2 weights on the node. **Parameters:** Directory to save the model weights. If None, uses self.model\_dir. ```python nemo_curator.models.transnetv2.TransNetV2.setup() -> None ``` Set up the TransNetV2 model interface. ```python class nemo_curator.models.transnetv2._TransNetV2( rf: int = 16, rl: int = 3, rs: int = 2, rd: int = 1024, use_many_hot_targets: bool = True, use_frame_similarity: bool = True, use_color_histograms: bool = True, use_mean_pooling: bool = False, dropout_rate: float = 0.5 ) ``` **Bases:** `Module` ```python nemo_curator.models.transnetv2._TransNetV2.forward( inputs: torch.Tensor ) -> torch.Tensor ``` Process input through the TransNetV2 model. **Parameters:** Input tensor of video frames. **Returns:** `torch.Tensor` Model predictions for shot transitions. ```python nemo_curator.models.transnetv2._TRANSNETV2_MODEL_ID: Final = 'Sn4kehead/TransNetV2' ``` ```python nemo_curator.models.transnetv2._TRANSNETV2_MODEL_REVISION: Final = 'db6ceab' ``` ```python nemo_curator.models.transnetv2._TRANSNETV2_MODEL_WEIGHTS: Final = 'transnetv2-pytorch-weights.pth' ```