***

layout: overview
slug: nemo-curator/nemo\_curator/utils/windowing\_utils
title: nemo\_curator.utils.windowing\_utils
-------------------------------------------

## Module Contents

### Classes

| Name                                                                     | Description                                                                  |
| ------------------------------------------------------------------------ | ---------------------------------------------------------------------------- |
| [`WindowFrameInfo`](#nemo_curator-utils-windowing_utils-WindowFrameInfo) | Container for frame window information, storing start and end frame indices. |

### Functions

| Name                                                                                       | Description                                                                                  |
| ------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------- |
| [`ceil_by_factor`](#nemo_curator-utils-windowing_utils-ceil_by_factor)                     | Return the smallest integer greater than or equal to 'number' that is divisible by 'factor'. |
| [`compute_windows`](#nemo_curator-utils-windowing_utils-compute_windows)                   | Generate windows by splitting the video into segments of the specified size.                 |
| [`fetch_video`](#nemo_curator-utils-windowing_utils-fetch_video)                           | Load and preprocess video frames from a file.                                                |
| [`floor_by_factor`](#nemo_curator-utils-windowing_utils-floor_by_factor)                   | Return the largest integer less than or equal to 'number' that is divisible by 'factor'.     |
| [`read_video_cpu`](#nemo_curator-utils-windowing_utils-read_video_cpu)                     | Read video using PyAv.                                                                       |
| [`round_by_factor`](#nemo_curator-utils-windowing_utils-round_by_factor)                   | Return the closest integer to 'number' that is divisible by 'factor'.                        |
| [`smart_nframes`](#nemo_curator-utils-windowing_utils-smart_nframes)                       | Calculate the number of frames for video used for model inputs.                              |
| [`smart_resize`](#nemo_curator-utils-windowing_utils-smart_resize)                         | Rescales the image so that the following conditions are met.                                 |
| [`split_video_into_windows`](#nemo_curator-utils-windowing_utils-split_video_into_windows) | Calculate windows and return video inputs for language model from input clips.               |

### Data

[`FPS`](#nemo_curator-utils-windowing_utils-FPS)

[`FPS_MAX_FRAMES`](#nemo_curator-utils-windowing_utils-FPS_MAX_FRAMES)

[`FPS_MIN_FRAMES`](#nemo_curator-utils-windowing_utils-FPS_MIN_FRAMES)

[`FRAME_FACTOR`](#nemo_curator-utils-windowing_utils-FRAME_FACTOR)

[`IMAGE_FACTOR`](#nemo_curator-utils-windowing_utils-IMAGE_FACTOR)

[`MAX_PIXELS`](#nemo_curator-utils-windowing_utils-MAX_PIXELS)

[`MAX_RATIO`](#nemo_curator-utils-windowing_utils-MAX_RATIO)

[`MIN_PIXELS`](#nemo_curator-utils-windowing_utils-MIN_PIXELS)

[`OPENAI_CLIP_MEAN`](#nemo_curator-utils-windowing_utils-OPENAI_CLIP_MEAN)

[`OPENAI_CLIP_STD`](#nemo_curator-utils-windowing_utils-OPENAI_CLIP_STD)

[`VIDEO_MAX_PIXELS`](#nemo_curator-utils-windowing_utils-VIDEO_MAX_PIXELS)

[`VIDEO_MIN_PIXELS`](#nemo_curator-utils-windowing_utils-VIDEO_MIN_PIXELS)

[`VIDEO_TOTAL_PIXELS`](#nemo_curator-utils-windowing_utils-VIDEO_TOTAL_PIXELS)

[`WINDOW_MIN_FRAMES`](#nemo_curator-utils-windowing_utils-WINDOW_MIN_FRAMES)

### API

<Anchor id="nemo_curator-utils-windowing_utils-WindowFrameInfo">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    class nemo_curator.utils.windowing_utils.WindowFrameInfo(
        start: int,
        end: int
    )
    ```
  </CodeBlock>
</Anchor>

<Indent>
  <Badge>
    Dataclass
  </Badge>

  Container for frame window information, storing start and end frame indices.

  This class represents a window of frames in a video, defined by its start and end frame positions.

  <ParamField path="end" type="int" />

  <ParamField path="start" type="int" />
</Indent>

<Anchor id="nemo_curator-utils-windowing_utils-ceil_by_factor">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.ceil_by_factor(
        number: float,
        factor: int
    ) -> int
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Return the smallest integer greater than or equal to 'number' that is divisible by 'factor'.
</Indent>

<Anchor id="nemo_curator-utils-windowing_utils-compute_windows">
  <CodeBlock links={{"nemo_curator.utils.windowing_utils.WindowFrameInfo":"#nemo_curator-utils-windowing_utils-WindowFrameInfo"}} showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.compute_windows(
        total_frames: int,
        window_size: int = 128,
        remainder_threshold: int = 64
    ) -> list[nemo_curator.utils.windowing_utils.WindowFrameInfo]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Generate windows by splitting the video into segments of the specified size.

  **Parameters:**

  <ParamField path="total_frames" type="int">
    total frames
  </ParamField>

  <ParamField path="window_size" type="int" default="128">
    The size of each window in number of frames.
  </ParamField>

  <ParamField path="remainder_threshold" type="int" default="64">
    The minimum number of frames required to create a new window from the remainder.
  </ParamField>
</Indent>

<Anchor id="nemo_curator-utils-windowing_utils-fetch_video">
  <CodeBlock links={{"nemo_curator.utils.windowing_utils.WindowFrameInfo":"#nemo_curator-utils-windowing_utils-WindowFrameInfo"}} showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.fetch_video(
        video_path: str,
        sampling_fps: float = 2.0,
        window_range: list[nemo_curator.utils.windowing_utils.WindowFrameInfo] | None = None,
        do_preprocess: bool = False,
        preprocess_dtype: str = 'float32',
        num_frames_to_use: int = 0,
        flip_input: bool = False
    ) -> tuple[torch.Tensor, list[int]]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Load and preprocess video frames from a file.

  **Parameters:**

  <ParamField path="video_path" type="str">
    Path to the video file.
  </ParamField>

  <ParamField path="sampling_fps" type="float" default="2.0">
    Target frames per second for sampling.
  </ParamField>

  <ParamField path="window_range" type="list[WindowFrameInfo] | None" default="None">
    List of frame windows to extract.
  </ParamField>

  <ParamField path="do_preprocess" type="bool" default="False">
    Whether to preprocess the frames.
  </ParamField>

  <ParamField path="preprocess_dtype" type="str" default="'float32'">
    Data type for preprocessing.
  </ParamField>

  <ParamField path="num_frames_to_use" type="int" default="0">
    Number of frames to extract (0 for all).
  </ParamField>

  <ParamField path="flip_input" type="bool" default="False">
    Whether to flip frames horizontally.
  </ParamField>

  **Returns:** `tuple[torch.Tensor, list[int]]`

  Tuple of (processed frames tensor, frame indices).
</Indent>

<Anchor id="nemo_curator-utils-windowing_utils-floor_by_factor">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.floor_by_factor(
        number: float,
        factor: int
    ) -> int
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Return the largest integer less than or equal to 'number' that is divisible by 'factor'.
</Indent>

<Anchor id="nemo_curator-utils-windowing_utils-read_video_cpu">
  <CodeBlock links={{"nemo_curator.utils.windowing_utils.WindowFrameInfo":"#nemo_curator-utils-windowing_utils-WindowFrameInfo"}} showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.read_video_cpu(
        video_path: str,
        fps: float,
        num_frames_to_use: int,
        window_range: list[nemo_curator.utils.windowing_utils.WindowFrameInfo]
    ) -> tuple[torch.Tensor, list[int]]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Read video using PyAv.

  **Parameters:**

  <ParamField path="video_path" type="str">
    path to the video support "file://", "http\://", "https\://" and local path.
  </ParamField>

  <ParamField path="fps" type="float">
    frames per second
  </ParamField>

  <ParamField path="num_frames_to_use" type="int">
    number of frames to use
  </ParamField>

  <ParamField path="window_range" type="list[WindowFrameInfo]">
    window range
  </ParamField>

  **Returns:** `tuple[torch.Tensor, list[int]]`

  torch.Tensor: the video tensor with shape (T, C, H, W).
</Indent>

<Anchor id="nemo_curator-utils-windowing_utils-round_by_factor">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.round_by_factor(
        number: float,
        factor: int
    ) -> int
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Return the closest integer to 'number' that is divisible by 'factor'.
</Indent>

<Anchor id="nemo_curator-utils-windowing_utils-smart_nframes">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.smart_nframes(
        fps: float,
        total_frames: int,
        video_fps: float
    ) -> int
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Calculate the number of frames for video used for model inputs.
</Indent>

<Anchor id="nemo_curator-utils-windowing_utils-smart_resize">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.smart_resize(
        height: int,
        width: int,
        factor: int = IMAGE_FACTOR,
        min_pixels: int = MIN_PIXELS,
        max_pixels: int = MAX_PIXELS
    ) -> tuple[int, int]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Rescales the image so that the following conditions are met.

  1. Both dimensions (height and width) are divisible by 'factor'.

  2. The total number of pixels is within the range \['min\_pixels', 'max\_pixels'].

  3. The aspect ratio of the image is maintained as closely as possible.
</Indent>

<Anchor id="nemo_curator-utils-windowing_utils-split_video_into_windows">
  <CodeBlock links={{"nemo_curator.utils.windowing_utils.WindowFrameInfo":"#nemo_curator-utils-windowing_utils-WindowFrameInfo"}} showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.split_video_into_windows(
        mp4_bytes: bytes,
        window_size: int = 256,
        remainder_threshold: int = 128,
        sampling_fps: float = 2.0,
        model_does_preprocess: bool = False,
        preprocess_dtype: str = 'uint8',
        flip_input: bool = False,
        num_frames_to_use: int = 0,
        return_bytes: bool = False,
        return_video_frames: bool = True,
        num_threads: int = 1
    ) -> tuple[list[bytes], list[torch.Tensor | None], list[nemo_curator.utils.windowing_utils.WindowFrameInfo]]
    ```
  </CodeBlock>
</Anchor>

<Indent>
  Calculate windows and return video inputs for language model from input clips.

  Processes video to determine the windows for a clip, decode in one shot and return processed frames
  for each window in a format suitable for consumption by the Qwen model.

  **Parameters:**

  <ParamField path="mp4_bytes" type="bytes">
    input video in bytes
  </ParamField>

  <ParamField path="fps">
    Frames per second of the input video.
  </ParamField>

  <ParamField path="preprocess_dtype" type="str" default="'uint8'">
    Data type to use for preprocessing the video/image inputs.
  </ParamField>

  <ParamField path="num_frames_to_use" type="int" default="0">
    Number of frames to extract from the video. If 0, uses all frames.
  </ParamField>

  <ParamField path="flip_input" type="bool" default="False">
    Whether to flip the input video/image horizontally.
  </ParamField>

  <ParamField path="return_bytes" type="bool" default="False">
    Whether to extract mp4 bytes for each window for use by PreviewStage
  </ParamField>

  <ParamField path="model_does_preprocess" type="bool" default="False">
    if the model does preprocessing
  </ParamField>

  <ParamField path="num_threads" type="int" default="1">
    number of threads
  </ParamField>

  <ParamField path="remainder_threshold" type="int" default="128">
    threshold for remainder
  </ParamField>

  <ParamField path="return_video_frames" type="bool" default="True">
    whether to return video frames
  </ParamField>

  <ParamField path="sampling_fps" type="float" default="2.0">
    sampling fps
  </ParamField>

  <ParamField path="window_size" type="int" default="256">
    window size
  </ParamField>

  **Returns:** `tuple[list[bytes], list[torch.Tensor | None], list[WindowFrameInfo]]`

  Tuple containing:

  * "window\_mp4\_bytes": mp4 bytes corresponding to each window - only used when Preview stage is enabled
  * "window\_frames": Decoded and per-window processed frames ready for use by Qwen model
  * "window info": start and end frame indices for each window in a clip
</Indent>

<Anchor id="nemo_curator-utils-windowing_utils-FPS">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.FPS = 2.0
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-FPS_MAX_FRAMES">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.FPS_MAX_FRAMES = 768
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-FPS_MIN_FRAMES">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.FPS_MIN_FRAMES = 4
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-FRAME_FACTOR">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.FRAME_FACTOR = 2
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-IMAGE_FACTOR">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.IMAGE_FACTOR = 28
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-MAX_PIXELS">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.MAX_PIXELS = 16384 * 28 * 28
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-MAX_RATIO">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.MAX_RATIO = 200
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-MIN_PIXELS">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.MIN_PIXELS = 4 * 28 * 28
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-OPENAI_CLIP_MEAN">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.OPENAI_CLIP_MEAN = [0.48145466, 0.4578275, 0.40821073]
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-OPENAI_CLIP_STD">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.OPENAI_CLIP_STD = [0.26862954, 0.26130258, 0.27577711]
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-VIDEO_MAX_PIXELS">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.VIDEO_MAX_PIXELS = 768 * 28 * 28
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-VIDEO_MIN_PIXELS">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.VIDEO_MIN_PIXELS = 128 * 28 * 28
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-VIDEO_TOTAL_PIXELS">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.VIDEO_TOTAL_PIXELS = 24576 * 28 * 28
    ```
  </CodeBlock>
</Anchor>

<Anchor id="nemo_curator-utils-windowing_utils-WINDOW_MIN_FRAMES">
  <CodeBlock showLineNumbers={false} wordWrap={true}>
    ```python
    nemo_curator.utils.windowing_utils.WINDOW_MIN_FRAMES = 4
    ```
  </CodeBlock>
</Anchor>
