> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.

# nemo_curator.stages.video.clipping.transnetv2_extraction

## Module Contents

### Classes

| Name                                                                                                                       | Description                                        |
| -------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
| [`TransNetV2ClipExtractionStage`](#nemo_curator-stages-video-clipping-transnetv2_extraction-TransNetV2ClipExtractionStage) | Stage for extracting video clips using TransNetV2. |

### Functions

| Name                                                                                                     | Description                                                                                       |
| -------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| [`_create_spans`](#nemo_curator-stages-video-clipping-transnetv2_extraction-_create_spans)               | Create spans between a start and an end point.                                                    |
| [`_crop_scenes`](#nemo_curator-stages-video-clipping-transnetv2_extraction-_crop_scenes)                 | Crop scenes by removing frames from start and end.                                                |
| [`_get_batches`](#nemo_curator-stages-video-clipping-transnetv2_extraction-_get_batches)                 | We fetch 100 frames, and pad the first and last batches accordingly with the first or last frame. |
| [`_get_filtered_scenes`](#nemo_curator-stages-video-clipping-transnetv2_extraction-_get_filtered_scenes) | Filter scenes.                                                                                    |
| [`_get_predictions`](#nemo_curator-stages-video-clipping-transnetv2_extraction-_get_predictions)         | Get predictions from the video frame array.                                                       |
| [`_get_scenes`](#nemo_curator-stages-video-clipping-transnetv2_extraction-_get_scenes)                   | Convert prediction array to scene array.                                                          |

### API

<Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-TransNetV2ClipExtractionStage">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage(
        model_dir: str = None,
        threshold: float = 0.4,
        min_length_s: float | None = 2.0,
        max_length_s: float | None = 10.0,
        max_length_mode: typing.Literal['truncate', 'stride'] = 'stride',
        crop_s: float | None = 0.5,
        entire_scene_as_clip: bool = True,
        gpu_memory_gb: int = 10,
        limit_clips: int = -1,
        verbose: bool = False,
        name: str = 'transnetv2_clip_extraction'
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  **Bases:** [ProcessingStage\[VideoTask, VideoTask\]](/nemo-curator/nemo_curator/stages/base#nemo_curator-stages-base-ProcessingStage)

  Stage for extracting video clips using TransNetV2.

  This class processes video clips through a series of steps including shot detection,
  scene filtering, and clip assignment.

  <ParamField path="crop_s" type="float | None = 0.5" />

  <ParamField path="entire_scene_as_clip" type="bool = True" />

  <ParamField path="gpu_memory_gb" type="int = 10" />

  <ParamField path="limit_clips" type="int = -1" />

  <ParamField path="max_length_mode" type="Literal['truncate', 'stride'] = 'stride'" />

  <ParamField path="max_length_s" type="float | None = 10.0" />

  <ParamField path="min_length_s" type="float | None = 2.0" />

  <ParamField path="model_dir" type="str = None" />

  <ParamField path="name" type="str = 'transnetv2_clip_extraction'" />

  <ParamField path="threshold" type="float = 0.4" />

  <ParamField path="verbose" type="bool = False" />

  <Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-TransNetV2ClipExtractionStage-__post_init__">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.__post_init__() -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-TransNetV2ClipExtractionStage-inputs">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.inputs() -> tuple[list[str], list[str]]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-TransNetV2ClipExtractionStage-outputs">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.outputs() -> tuple[list[str], list[str]]
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-TransNetV2ClipExtractionStage-process">
    <CodeBlock links={{"nemo_curator.tasks.video.VideoTask":"/nemo-curator/nemo_curator/tasks/video#nemo_curator-tasks-video-VideoTask"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.process(
          task: nemo_curator.tasks.video.VideoTask
      ) -> nemo_curator.tasks.video.VideoTask
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-TransNetV2ClipExtractionStage-setup">
    <CodeBlock links={{"nemo_curator.backends.base.WorkerMetadata":"/nemo-curator/nemo_curator/backends/base#nemo_curator-backends-base-WorkerMetadata"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.setup(
          worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
      ) -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent />

  <Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-TransNetV2ClipExtractionStage-setup_on_node">
    <CodeBlock links={{"nemo_curator.backends.base.NodeInfo":"/nemo-curator/nemo_curator/backends/base#nemo_curator-backends-base-NodeInfo","nemo_curator.backends.base.WorkerMetadata":"/nemo-curator/nemo_curator/backends/base#nemo_curator-backends-base-WorkerMetadata"}} showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.stages.video.clipping.transnetv2_extraction.TransNetV2ClipExtractionStage.setup_on_node(
          node_info: nemo_curator.backends.base.NodeInfo,
          worker_metadata: nemo_curator.backends.base.WorkerMetadata
      ) -> None
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Download TransNetV2 weights on the node.
  </Indent>
</Indent>

<Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-_create_spans">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.video.clipping.transnetv2_extraction._create_spans(
        start: int,
        end: int,
        max_length: int,
        min_length: int | None
    ) -> list[list[int]]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Create spans between a start and an end point.

  **Parameters:**

  <ParamField path="start" type="int">
    start point.
  </ParamField>

  <ParamField path="end" type="int">
    end point.
  </ParamField>

  <ParamField path="max_length" type="int">
    maximum length of span.
  </ParamField>

  <ParamField path="min_length" type="int | None">
    minimum length of span.
  </ParamField>

  **Returns:** `list[list[int]]`

  list of spans.
</Indent>

<Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-_crop_scenes">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.video.clipping.transnetv2_extraction._crop_scenes(
        scenes: numpy.typing.NDArray[numpy.int32],
        crop_length: int
    ) -> numpy.typing.NDArray[numpy.int32]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Crop scenes by removing frames from start and end.

  **Parameters:**

  <ParamField path="scenes" type="npt.NDArray[np.int32]">
    integer 2D array like \[\[t0, t1], \[t2, t3], ...]
  </ParamField>

  <ParamField path="crop_length" type="int">
    number of frames to crop from start and end of scene.
  </ParamField>

  **Returns:** `npt.NDArray[np.int32]`

  cropped scene array.
</Indent>

<Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-_get_batches">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.video.clipping.transnetv2_extraction._get_batches(
        frames: numpy.typing.NDArray[numpy.uint8]
    ) -> collections.abc.Generator[numpy.typing.NDArray[numpy.uint8], None, None]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  We fetch 100 frames, and pad the first and last batches accordingly with the first or last frame.
</Indent>

<Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-_get_filtered_scenes">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.video.clipping.transnetv2_extraction._get_filtered_scenes(
        scenes: numpy.typing.NDArray[numpy.int32],
        min_length: int | None = None,
        max_length: int | None = None,
        max_length_mode: typing.Literal['truncate', 'stride'] = 'truncate',
        crop_length: int | None = None
    ) -> numpy.typing.NDArray[numpy.int32]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Filter scenes.

  **Parameters:**

  <ParamField path="scenes" type="npt.NDArray[np.int32]">
    integer 2D array like \[\[t0, t1], \[t2, t3], ...]
  </ParamField>

  <ParamField path="min_length" type="int | None" default="None">
    optional minimum length of frames a scene can have.
  </ParamField>

  <ParamField path="max_length" type="int | None" default="None">
    optional maximum length of frames a scene can have.
  </ParamField>

  <ParamField path="max_length_mode" type="Literal['truncate', 'stride']" default="'truncate'">
    how to deal with scenes that are above max length.
    If `truncate` will truncate the length of each scene by `max_length`, if specified.
    If `stride`, will generate a number of max\_length scenes until the end of the scene.
    If the end scene is less than `min_length`, it will drop the last scene.
  </ParamField>

  <ParamField path="crop_length" type="int | None" default="None">
    optional number of frames to crop from start and end of scene.
    If cropped scenes result in zero-length scenes, these will be filtered.
  </ParamField>

  **Returns:** `npt.NDArray[np.int32]`

  filtered scene array.
</Indent>

<Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-_get_predictions">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.video.clipping.transnetv2_extraction._get_predictions(
        model: collections.abc.Callable[[torch.Tensor], torch.Tensor],
        frames: numpy.typing.NDArray[numpy.uint8],
        threshold: float
    ) -> numpy.typing.NDArray[numpy.uint8]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Get predictions from the video frame array.

  **Parameters:**

  <ParamField path="model" type="Callable[[torch.Tensor], torch.Tensor]">
    shot detection model.
  </ParamField>

  <ParamField path="frames" type="npt.NDArray[np.uint8]">
    uint8 array of shape (# frames, height, width, 3), with RGB channels.
  </ParamField>

  <ParamField path="threshold" type="float">
    probability threshold for shot detection.
  </ParamField>

  **Returns:** `npt.NDArray[np.uint8]`

  0/1 prediction array of shape (# frames, 1)
</Indent>

<Anchor id="nemo_curator-stages-video-clipping-transnetv2_extraction-_get_scenes">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.stages.video.clipping.transnetv2_extraction._get_scenes(
        predictions: numpy.typing.NDArray[numpy.uint8],
        entire_scene_as_clip: bool
    ) -> numpy.typing.NDArray[numpy.int32]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Convert prediction array to scene array.

  **Parameters:**

  <ParamField path="predictions" type="npt.NDArray[np.uint8]">
    array of shape \[# frames, 1].
    Values are 1 if frame is a shot transition, and 0 if it's not.
  </ParamField>

  <ParamField path="entire_scene_as_clip" type="bool">
    If there are *no* shot transitions found, this will make a scene spanning the whole video.
  </ParamField>

  **Returns:** `npt.NDArray[np.int32]`

  scene array of shape \[# scenes, 2], where the value at each row is the start and end frame of the shot.
</Indent>