nemo_curator.models.transnetv2
Model for fast shot transition detection.
@article{soucek2020transnetv2, title={TransNet V2: An effective deep network architecture for fast shot transition detection}, author={Sou{\v{c}}ek, Tom{‘a}{\v{s}} and Loko{\v{c}}, Jakub}, year={2020}, journal={arXiv preprint arXiv:2008.04838}, }
Module Contents
Classes
Data
API
Bases: Module
Model for computing and comparing color histograms across video frames.
Compute color histograms for video frames.
Parameters:
Input tensor of video frames.
Returns: torch.Tensor
Color histogram tensor.
Process input frames through the model.
Parameters:
Input tensor of video frames.
Returns: torch.Tensor
Model predictions for shot transitions.
Bases: Module
Configurable 3D convolution layer with support for separable convolutions.
Process input through the 3D convolutional layers.
Parameters:
Input tensor.
Returns: torch.Tensor
Processed tensor.
Bases: Module
Dilated dense convolutional model with multiple dilation rates.
Process input through the dilated dense convolutional network.
Parameters:
Input tensor.
Returns: torch.Tensor
Processed tensor.
Bases: Module
Model for computing frame similarity features in video sequences.
Process input frames through the model.
Parameters:
Input tensor of video frames.
Returns: torch.Tensor
Frame similarity features.
Bases: Module
Stacked dilated dense convolutional neural network for video feature extraction.
Process input through the stacked dilated dense convolutional network.
Parameters:
Input tensor.
Returns: torch.Tensor
Processed tensor.
Bases: ModelInterface
Interface for TransNetV2 shot transition detection model.
Get the model ID names.
TransNetV2 model call.
Parameters:
tensor of shape [# batch, # frames, height, width, RGB].
Returns: torch.Tensor
tensor of shape [# batch, # frames, 1] of probabilities for each frame being a shot transition.
Download TransNetV2 weights on the node.
Parameters:
Directory to save the model weights. If None, uses self.model_dir.
Set up the TransNetV2 model interface.
Bases: Module
Process input through the TransNetV2 model.
Parameters:
Input tensor of video frames.
Returns: torch.Tensor
Model predictions for shot transitions.