***

layout: overview
slug: nemo-curator/nemo\_curator/utils/decoder\_utils
title: nemo\_curator.utils.decoder\_utils
-----------------------------------------

## Module Contents

### Classes

| Name                                                                                     | Description                                                                     |
| ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
| [`FrameExtractionPolicy`](#nemo_curator-utils-decoder_utils-FrameExtractionPolicy)       | Policy for extracting frames from video content.                                |
| [`FrameExtractionSignature`](#nemo_curator-utils-decoder_utils-FrameExtractionSignature) | Configuration for frame extraction parameters.                                  |
| [`FramePurpose`](#nemo_curator-utils-decoder_utils-FramePurpose)                         | Purpose for extracting frames from video content.                               |
| [`Resolution`](#nemo_curator-utils-decoder_utils-Resolution)                             | Container for video frame dimensions.                                           |
| [`VideoMetadata`](#nemo_curator-utils-decoder_utils-VideoMetadata)                       | Metadata for video content including dimensions, timing, and codec information. |

### Functions

| Name                                                                                         | Description                                                                                |
| -------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
| [`_make_video_stream`](#nemo_curator-utils-decoder_utils-_make_video_stream)                 | Convert various input types into a binary stream for video processing.                     |
| [`decode_video_cpu`](#nemo_curator-utils-decoder_utils-decode_video_cpu)                     | Decode video frames from a binary stream using PyAV with configurable frame rate sampling. |
| [`decode_video_cpu_frame_ids`](#nemo_curator-utils-decoder_utils-decode_video_cpu_frame_ids) | Decode video using PyAV frame ids.                                                         |
| [`extract_frames`](#nemo_curator-utils-decoder_utils-extract_frames)                         | Extract frames from a video into a numpy array.                                            |
| [`extract_video_metadata`](#nemo_curator-utils-decoder_utils-extract_video_metadata)         | Extract metadata from a video file using ffprobe.                                          |
| [`find_closest_indices`](#nemo_curator-utils-decoder_utils-find_closest_indices)             | Find the closest indices in src to each element in dst.                                    |
| [`get_avg_frame_rate`](#nemo_curator-utils-decoder_utils-get_avg_frame_rate)                 | Get the average frame rate of a video.                                                     |
| [`get_frame_count`](#nemo_curator-utils-decoder_utils-get_frame_count)                       | Get the total number of frames in a video file or stream.                                  |
| [`get_video_timestamps`](#nemo_curator-utils-decoder_utils-get_video_timestamps)             | Get timestamps for all frames in a video stream.                                           |
| [`sample_closest`](#nemo_curator-utils-decoder_utils-sample_closest)                         | Sample `src` at `sample_rate` rate and return the closest indices.                         |
| [`save_stream_position`](#nemo_curator-utils-decoder_utils-save_stream_position)             | Context manager that saves and restores stream position.                                   |

### API

<Anchor id="nemo_curator-utils-decoder_utils-FrameExtractionPolicy">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.utils.decoder_utils.FrameExtractionPolicy
    ```
  </CodeBlock>
</Anchor>

<Indent>
  **Bases:** `enum.Enum`

  Policy for extracting frames from video content.

  This enum defines different strategies for selecting frames from a video,
  including first frame, middle frame, last frame, or a sequence of frames.

  <ParamField path="first" type="= 0" />

  <ParamField path="last" type="= 2" />

  <ParamField path="middle" type="= 1" />

  <ParamField path="sequence" type="= 3" />
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-FrameExtractionSignature">
  <CodeBlock links={{"nemo_curator.utils.decoder_utils.FrameExtractionPolicy":"#nemo_curator-utils-decoder_utils-FrameExtractionPolicy"}} showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.utils.decoder_utils.FrameExtractionSignature(
        extraction_policy: nemo_curator.utils.decoder_utils.FrameExtractionPolicy,
        target_fps: float
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  Configuration for frame extraction parameters.

  This class combines extraction policy and target frame rate into a single signature
  that can be used to identify and reproduce frame extraction settings.

  <ParamField path="extraction_policy" type="FrameExtractionPolicy" />

  <ParamField path="target_fps" type="float" />

  <Anchor id="nemo_curator-utils-decoder_utils-FrameExtractionSignature-to_str">
    <CodeBlock showLineNumbers={false} wordWrap={true}>
      ```python
      nemo_curator.utils.decoder_utils.FrameExtractionSignature.to_str() -> str
      ```
    </CodeBlock>
  </Anchor>

  <Indent>
    Convert frame extraction signature to string format.

    **Returns:** `str`

    String representation of extraction policy and target FPS.
  </Indent>
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-FramePurpose">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.utils.decoder_utils.FramePurpose
    ```
  </CodeBlock>
</Anchor>

<Indent>
  **Bases:** `enum.Enum`

  Purpose for extracting frames from video content.

  This enum defines different purposes for extracting frames from a video,
  including aesthetics and embeddings.

  <ParamField path="AESTHETICS" type="= 1" />

  <ParamField path="EMBEDDINGS" type="= 2" />
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-Resolution">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.utils.decoder_utils.Resolution()
    ```
  </CodeBlock>
</Anchor>

<Indent>
  **Bases:** `NamedTuple`

  Container for video frame dimensions.

  This class stores the height and width of video frames as a named tuple.

  <ParamField path="height" type="int" />

  <ParamField path="width" type="int" />
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-VideoMetadata">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.utils.decoder_utils.VideoMetadata(
        height: int = None,
        width: int = None,
        fps: float = None,
        num_frames: int = None,
        video_codec: str = None,
        pixel_format: str = None,
        video_duration: float = None,
        audio_codec: str = None,
        bit_rate_k: int = None
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  Metadata for video content including dimensions, timing, and codec information.

  This class stores essential video properties such as resolution, frame rate,
  duration, and encoding details.

  <ParamField path="audio_codec" type="str = None" />

  <ParamField path="bit_rate_k" type="int = None" />

  <ParamField path="fps" type="float = None" />

  <ParamField path="height" type="int = None" />

  <ParamField path="num_frames" type="int = None" />

  <ParamField path="pixel_format" type="str = None" />

  <ParamField path="video_codec" type="str = None" />

  <ParamField path="video_duration" type="float = None" />

  <ParamField path="width" type="int = None" />
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-_make_video_stream">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils._make_video_stream(
        data: pathlib.Path | str | typing.BinaryIO | bytes | io.BytesIO | io.BufferedReader
    ) -> typing.BinaryIO
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Convert various input types into a binary stream for video processing.

  This function handles different input types that could represent video
  data and converts them into a consistent BinaryIO interface that can be
  used for video processing operations.

  **Parameters:**

  <ParamField path="data" type="Path | str | BinaryIO | bytes | io.BytesIO | io.BufferedReader">
    The input video data, which can be one of:

    * Path: A path to a video file
    * bytes: Raw video data in bytes
    * io.BytesIO: An in-memory binary stream
    * io.BufferedReader: A buffered binary file reader
    * BinaryIO: Any binary stream
  </ParamField>

  **Returns:** `BinaryIO`

  A binary stream containing the video data

  **Raises:**

  * `ValueError`: If the input type is not one of the supported types
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-decode_video_cpu">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils.decode_video_cpu(
        data: pathlib.Path | str | typing.BinaryIO | bytes,
        sample_rate_fps: float,
        timestamps: numpy.typing.NDArray[numpy.float32] | None = None,
        start: float | None = None,
        stop: float | None = None,
        endpoint: bool = True,
        stream_idx: int = 0,
        video_format: str | None = None,
        num_threads: int = 1
    ) -> numpy.typing.NDArray[numpy.uint8]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Decode video frames from a binary stream using PyAV with configurable frame rate sampling.

  This function decodes video frames from a binary stream at a specified
  frame rate. The frame rate does not need to match the input video's frame
  rate. It is possible to supersample a video as well as undersample.

  **Parameters:**

  <ParamField path="data" type="Path | str | BinaryIO | bytes">
    An open file, io.BytesIO, or bytes object with the video data.
  </ParamField>

  <ParamField path="sample_rate_fps" type="float">
    Frame rate for sampling the video
  </ParamField>

  <ParamField path="timestamps" type="npt.NDArray[np.float32] | None" default="None">
    Optional array of presentation timestamps for each frame
    in the video. If supplied, this array *must* be monotonically
    increasing. If not supplied, timestamps will be extracted from the
    video stream.
  </ParamField>

  <ParamField path="start" type="float | None" default="None">
    Optional start timestamp for frame extraction. If None, the
    first frame timestamp is used.
  </ParamField>

  <ParamField path="stop" type="float | None" default="None">
    Optional end timestamp for frame extraction. If None, the last
    frame timestamp is used.
  </ParamField>

  <ParamField path="endpoint" type="bool" default="True">
    If True, stop is the last sample. Otherwise, it is not included.
    Default is True.
  </ParamField>

  <ParamField path="stream_idx" type="int" default="0">
    PyAv index of the video stream to decode, usually 0.
  </ParamField>

  <ParamField path="video_format" type="str | None" default="None">
    Format of the video stream, like "mp4", "mkv", etc.
    None is probably best
  </ParamField>

  <ParamField path="num_threads" type="int" default="1">
    Number of threads to use for decoding.
  </ParamField>

  **Returns:** `npt.NDArray[np.uint8]`

  A numpy array of shape (num\_frames, height, width, channels) containing the decoded

  **Raises:**

  * `ValueError`: If the sampled timestamps differ from source timestamps by more than
    the specified tolerance
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-decode_video_cpu_frame_ids">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils.decode_video_cpu_frame_ids(
        data: pathlib.Path | str | typing.BinaryIO | bytes,
        frame_ids: numpy.typing.NDArray[numpy.int32],
        counts: numpy.typing.NDArray[numpy.int32] | None = None,
        stream_idx: int = 0,
        video_format: str | None = None,
        num_threads: int = 1
    ) -> numpy.typing.NDArray[numpy.uint8]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Decode video using PyAV frame ids.

  It is not recommended to use this function directly. Instead, use
  `decode_video_cpu`, which is timestamp-based. Timestamps are necessary for
  synchronizing sensors, like multiple cameras, or synchronizing video with
  GPS and LIDAR.

  **Parameters:**

  <ParamField path="data" type="Path | str | BinaryIO | bytes">
    An open file, io.BytesIO, or bytes object with the video data.
  </ParamField>

  <ParamField path="frame_ids" type="npt.NDArray[np.int32]">
    List of frame ids to decode.
  </ParamField>

  <ParamField path="counts" type="npt.NDArray[np.int32] | None" default="None">
    List of counts for each frame id. It is possible that a frame id
    is repeated during supersampling, which can happen in videos with
    frame drops, or just due to clock drift between sensors.
  </ParamField>

  <ParamField path="stream_idx" type="int" default="0">
    PyAv index of the video stream to decode, usually 0.
  </ParamField>

  <ParamField path="video_format" type="str | None" default="None">
    Format of the video stream, like "mp4", "mkv", etc.
    None is probably best
  </ParamField>

  <ParamField path="num_threads" type="int" default="1">
    Number of threads to use for decoding.
  </ParamField>

  **Returns:** `npt.NDArray[np.uint8]`

  A numpy array of shape (frame\_count, height, width, channels) containing
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-extract_frames">
  <CodeBlock links={{"nemo_curator.utils.decoder_utils.FrameExtractionPolicy":"#nemo_curator-utils-decoder_utils-FrameExtractionPolicy"}} showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils.extract_frames(
        video: pathlib.Path | str | typing.BinaryIO | bytes,
        extraction_policy: nemo_curator.utils.decoder_utils.FrameExtractionPolicy,
        sample_rate_fps: float = 1.0,
        target_res: tuple[int, int] = (-1, -1),
        num_threads: int = 1,
        stream_idx: int = 0,
        video_format: str | None = None
    ) -> numpy.typing.NDArray[numpy.uint8]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Extract frames from a video into a numpy array.

  **Parameters:**

  <ParamField path="video" type="Path | str | BinaryIO | bytes">
    An open file, io.BytesIO, or bytes object with the video data.
  </ParamField>

  <ParamField path="extraction_policy" type="FrameExtractionPolicy">
    The policy for extracting frames.
  </ParamField>

  <ParamField path="sample_rate_fps" type="float" default="1.0">
    Frame rate for sampling the video
  </ParamField>

  <ParamField path="target_res" type="tuple[int, int]" default="(-1, -1)">
    The target resolution for the frames.
  </ParamField>

  <ParamField path="stream_idx" type="int" default="0">
    PyAv index of the video stream to decode, usually 0.
  </ParamField>

  <ParamField path="video_format" type="str | None" default="None">
    Format of the video stream, like "mp4", "mkv", etc.
    None is probably best
  </ParamField>

  <ParamField path="num_threads" type="int" default="1">
    Number of threads to use for decoding.
  </ParamField>

  **Returns:** `npt.NDArray[np.uint8]`

  A numpy array of shape (num\_frames, height, width, 3) containing the decoded
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-extract_video_metadata">
  <CodeBlock links={{"nemo_curator.utils.decoder_utils.VideoMetadata":"#nemo_curator-utils-decoder_utils-VideoMetadata"}} showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils.extract_video_metadata(
        video: str | bytes
    ) -> nemo_curator.utils.decoder_utils.VideoMetadata
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Extract metadata from a video file using ffprobe.

  **Parameters:**

  <ParamField path="video" type="str | bytes">
    Path to video file or video data as bytes.
  </ParamField>

  **Returns:** `VideoMetadata`

  VideoMetadata object containing video properties.
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-find_closest_indices">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils.find_closest_indices(
        src: numpy.typing.NDArray[numpy.float32],
        dst: numpy.typing.NDArray[numpy.float32]
    ) -> numpy.typing.NDArray[numpy.int32]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Find the closest indices in src to each element in dst.

  If an element in dst is equidistant from two elements in src, the left
  index in src is used.

  **Parameters:**

  <ParamField path="src" type="npt.NDArray[np.float32]">
    Monotonically increasing array of numbers to match dst against
  </ParamField>

  <ParamField path="dst" type="npt.NDArray[np.float32]">
    Monotonically increasing array of numbers to search for in src
  </ParamField>

  **Returns:** `npt.NDArray[np.int32]`

  Array of closest indices in src for each element in dst
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-get_avg_frame_rate">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils.get_avg_frame_rate(
        data: pathlib.Path | str | typing.BinaryIO | bytes,
        stream_idx: int = 0,
        video_format: str | None = None
    ) -> float
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Get the average frame rate of a video.

  **Parameters:**

  <ParamField path="data" type="Path | str | BinaryIO | bytes">
    An open file, io.BytesIO, or bytes object with the video data.
  </ParamField>

  <ParamField path="stream_idx" type="int" default="0">
    Index of the video stream to decode, usually 0.
  </ParamField>

  <ParamField path="video_format" type="str | None" default="None">
    Format of the video stream, like "mp4", "mkv", etc.
    None is probably best
  </ParamField>

  **Returns:** `float`

  The average frame rate of the video.
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-get_frame_count">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils.get_frame_count(
        data: pathlib.Path | str | typing.BinaryIO | bytes,
        stream_idx: int = 0,
        video_format: str | None = None
    ) -> int
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Get the total number of frames in a video file or stream.

  **Parameters:**

  <ParamField path="data" type="Path | str | BinaryIO | bytes">
    An open file, io.BytesIO, or bytes object with the video data.
  </ParamField>

  <ParamField path="stream_idx" type="int" default="0">
    Index of the video stream to read from. Defaults to 0,
    which is typically the main video stream.
  </ParamField>

  <ParamField path="video_format" type="str | None" default="None">
    Format of the video stream, like "mp4", "mkv", etc.
    None is probably best
  </ParamField>

  **Returns:** `int`

  The total number of frames in the video stream.
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-get_video_timestamps">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils.get_video_timestamps(
        data: pathlib.Path | str | typing.BinaryIO | bytes,
        stream_idx: int = 0,
        video_format: str | None = None
    ) -> numpy.typing.NDArray[numpy.float32]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Get timestamps for all frames in a video stream.

  The file position will be moved as needed to get the timestamps.

  Note: the order that frames appear in a video stream is not necessarily
  the order that the frames will be displayed. This means that timestamps
  are not monotonically increasing within a video stream. This can happen
  when B-frames are present

  This function will return presentation timestamps in monotonically
  increasing order.

  **Parameters:**

  <ParamField path="data" type="Path | str | BinaryIO | bytes">
    An open file, io.BytesIO, or bytes object with the video data.
  </ParamField>

  <ParamField path="stream_idx" type="int" default="0">
    PyAv index of the video stream to decode, usually 0.
  </ParamField>

  <ParamField path="video_format" type="str | None" default="None">
    Format of the video stream, like "mp4", "mkv", etc.
    None is probably best
  </ParamField>

  **Returns:** `npt.NDArray[np.float32]`

  A numpy array of monotonically increasing timestamps.
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-sample_closest">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils.sample_closest(
        src: numpy.typing.NDArray[numpy.float32],
        sample_rate: float,
        start: float | None = None,
        stop: float | None = None,
        endpoint: bool = True,
        dedup: bool = True
    ) -> tuple[numpy.typing.NDArray[numpy.int32], numpy.typing.NDArray[numpy.int32], numpy.typing.NDArray[numpy.float32]]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Sample `src` at `sample_rate` rate and return the closest indices.

  This function is meant to be used for sampling monotonically increasing
  numbers, like timestamps. This function can be used for synchronizing
  sensors, like multiple cameras, or synchronizing video with GPS and LIDAR.

  The first element sampled with either or src\[0] or the element closest
  to `start`

  The last element sampled will either be src\[-1] or the element closest
  to `stop`. The last element is only included if it both fits into the
  sampling rate and if endpoint=True

  This function intentionally has no policy about distance from the closest
  elements in src to the sample elements. It will return the index of the
  closest element to the sample. It is up to the caller to define policy,
  which is why sample\_elements is returned.

  **Parameters:**

  <ParamField path="src" type="npt.NDArray[np.float32]">
    Monotonically increasing array of elements
  </ParamField>

  <ParamField path="sample_rate" type="float">
    Sampling rate
  </ParamField>

  <ParamField path="start" type="float | None" default="None">
    Start element (defaults to first element)
  </ParamField>

  <ParamField path="stop" type="float | None" default="None">
    End element (defaults to last element)
  </ParamField>

  <ParamField path="endpoint" type="bool" default="True">
    If True, `stop` can be the last sample, if it fits into
    the sample rate. If False, `stop` is not included in the output.
  </ParamField>

  <ParamField path="dedup" type="bool" default="True">
    Whether to deduplicate indices. Repeated indices will be
    reflected in the returned counts array.
  </ParamField>

  **Returns:** `npt.NDArray[np.int32]`

  Tuple of (indices, counts) where counts\[i] is the number of times
</Indent>

<Anchor id="nemo_curator-utils-decoder_utils-save_stream_position">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.decoder_utils.save_stream_position(
        stream: typing.BinaryIO
    ) -> collections.abc.Generator[typing.BinaryIO, None, None]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Context manager that saves and restores stream position.
</Indent>
