***
layout: overview
slug: nemo-curator/nemo\_curator/utils/windowing\_utils
title: nemo\_curator.utils.windowing\_utils
-------------------------------------------
## Module Contents
### Classes
| Name | Description |
| ------------------------------------------------------------------------ | ---------------------------------------------------------------------------- |
| [`WindowFrameInfo`](#nemo_curator-utils-windowing_utils-WindowFrameInfo) | Container for frame window information, storing start and end frame indices. |
### Functions
| Name | Description |
| ------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------- |
| [`ceil_by_factor`](#nemo_curator-utils-windowing_utils-ceil_by_factor) | Return the smallest integer greater than or equal to 'number' that is divisible by 'factor'. |
| [`compute_windows`](#nemo_curator-utils-windowing_utils-compute_windows) | Generate windows by splitting the video into segments of the specified size. |
| [`fetch_video`](#nemo_curator-utils-windowing_utils-fetch_video) | Load and preprocess video frames from a file. |
| [`floor_by_factor`](#nemo_curator-utils-windowing_utils-floor_by_factor) | Return the largest integer less than or equal to 'number' that is divisible by 'factor'. |
| [`read_video_cpu`](#nemo_curator-utils-windowing_utils-read_video_cpu) | Read video using PyAv. |
| [`round_by_factor`](#nemo_curator-utils-windowing_utils-round_by_factor) | Return the closest integer to 'number' that is divisible by 'factor'. |
| [`smart_nframes`](#nemo_curator-utils-windowing_utils-smart_nframes) | Calculate the number of frames for video used for model inputs. |
| [`smart_resize`](#nemo_curator-utils-windowing_utils-smart_resize) | Rescales the image so that the following conditions are met. |
| [`split_video_into_windows`](#nemo_curator-utils-windowing_utils-split_video_into_windows) | Calculate windows and return video inputs for language model from input clips. |
### Data
[`FPS`](#nemo_curator-utils-windowing_utils-FPS)
[`FPS_MAX_FRAMES`](#nemo_curator-utils-windowing_utils-FPS_MAX_FRAMES)
[`FPS_MIN_FRAMES`](#nemo_curator-utils-windowing_utils-FPS_MIN_FRAMES)
[`FRAME_FACTOR`](#nemo_curator-utils-windowing_utils-FRAME_FACTOR)
[`IMAGE_FACTOR`](#nemo_curator-utils-windowing_utils-IMAGE_FACTOR)
[`MAX_PIXELS`](#nemo_curator-utils-windowing_utils-MAX_PIXELS)
[`MAX_RATIO`](#nemo_curator-utils-windowing_utils-MAX_RATIO)
[`MIN_PIXELS`](#nemo_curator-utils-windowing_utils-MIN_PIXELS)
[`OPENAI_CLIP_MEAN`](#nemo_curator-utils-windowing_utils-OPENAI_CLIP_MEAN)
[`OPENAI_CLIP_STD`](#nemo_curator-utils-windowing_utils-OPENAI_CLIP_STD)
[`VIDEO_MAX_PIXELS`](#nemo_curator-utils-windowing_utils-VIDEO_MAX_PIXELS)
[`VIDEO_MIN_PIXELS`](#nemo_curator-utils-windowing_utils-VIDEO_MIN_PIXELS)
[`VIDEO_TOTAL_PIXELS`](#nemo_curator-utils-windowing_utils-VIDEO_TOTAL_PIXELS)
[`WINDOW_MIN_FRAMES`](#nemo_curator-utils-windowing_utils-WINDOW_MIN_FRAMES)
### API
```python
class nemo_curator.utils.windowing_utils.WindowFrameInfo(
start: int,
end: int
)
```
Dataclass
Container for frame window information, storing start and end frame indices.
This class represents a window of frames in a video, defined by its start and end frame positions.
```python
nemo_curator.utils.windowing_utils.ceil_by_factor(
number: float,
factor: int
) -> int
```
Return the smallest integer greater than or equal to 'number' that is divisible by 'factor'.
```python
nemo_curator.utils.windowing_utils.compute_windows(
total_frames: int,
window_size: int = 128,
remainder_threshold: int = 64
) -> list[nemo_curator.utils.windowing_utils.WindowFrameInfo]
```
Generate windows by splitting the video into segments of the specified size.
**Parameters:**
total frames
The size of each window in number of frames.
The minimum number of frames required to create a new window from the remainder.
```python
nemo_curator.utils.windowing_utils.fetch_video(
video_path: str,
sampling_fps: float = 2.0,
window_range: list[nemo_curator.utils.windowing_utils.WindowFrameInfo] | None = None,
do_preprocess: bool = False,
preprocess_dtype: str = 'float32',
num_frames_to_use: int = 0,
flip_input: bool = False
) -> tuple[torch.Tensor, list[int]]
```
Load and preprocess video frames from a file.
**Parameters:**
Path to the video file.
Target frames per second for sampling.
List of frame windows to extract.
Whether to preprocess the frames.
Data type for preprocessing.
Number of frames to extract (0 for all).
Whether to flip frames horizontally.
**Returns:** `tuple[torch.Tensor, list[int]]`
Tuple of (processed frames tensor, frame indices).
```python
nemo_curator.utils.windowing_utils.floor_by_factor(
number: float,
factor: int
) -> int
```
Return the largest integer less than or equal to 'number' that is divisible by 'factor'.
```python
nemo_curator.utils.windowing_utils.read_video_cpu(
video_path: str,
fps: float,
num_frames_to_use: int,
window_range: list[nemo_curator.utils.windowing_utils.WindowFrameInfo]
) -> tuple[torch.Tensor, list[int]]
```
Read video using PyAv.
**Parameters:**
path to the video support "file://", "http\://", "https\://" and local path.
frames per second
number of frames to use
window range
**Returns:** `tuple[torch.Tensor, list[int]]`
torch.Tensor: the video tensor with shape (T, C, H, W).
```python
nemo_curator.utils.windowing_utils.round_by_factor(
number: float,
factor: int
) -> int
```
Return the closest integer to 'number' that is divisible by 'factor'.
```python
nemo_curator.utils.windowing_utils.smart_nframes(
fps: float,
total_frames: int,
video_fps: float
) -> int
```
Calculate the number of frames for video used for model inputs.
```python
nemo_curator.utils.windowing_utils.smart_resize(
height: int,
width: int,
factor: int = IMAGE_FACTOR,
min_pixels: int = MIN_PIXELS,
max_pixels: int = MAX_PIXELS
) -> tuple[int, int]
```
Rescales the image so that the following conditions are met.
1. Both dimensions (height and width) are divisible by 'factor'.
2. The total number of pixels is within the range \['min\_pixels', 'max\_pixels'].
3. The aspect ratio of the image is maintained as closely as possible.
```python
nemo_curator.utils.windowing_utils.split_video_into_windows(
mp4_bytes: bytes,
window_size: int = 256,
remainder_threshold: int = 128,
sampling_fps: float = 2.0,
model_does_preprocess: bool = False,
preprocess_dtype: str = 'uint8',
flip_input: bool = False,
num_frames_to_use: int = 0,
return_bytes: bool = False,
return_video_frames: bool = True,
num_threads: int = 1
) -> tuple[list[bytes], list[torch.Tensor | None], list[nemo_curator.utils.windowing_utils.WindowFrameInfo]]
```
Calculate windows and return video inputs for language model from input clips.
Processes video to determine the windows for a clip, decode in one shot and return processed frames
for each window in a format suitable for consumption by the Qwen model.
**Parameters:**
input video in bytes
Frames per second of the input video.
Data type to use for preprocessing the video/image inputs.
Number of frames to extract from the video. If 0, uses all frames.
Whether to flip the input video/image horizontally.
Whether to extract mp4 bytes for each window for use by PreviewStage
if the model does preprocessing
number of threads
threshold for remainder
whether to return video frames
sampling fps
window size
**Returns:** `tuple[list[bytes], list[torch.Tensor | None], list[WindowFrameInfo]]`
Tuple containing:
* "window\_mp4\_bytes": mp4 bytes corresponding to each window - only used when Preview stage is enabled
* "window\_frames": Decoded and per-window processed frames ready for use by Qwen model
* "window info": start and end frame indices for each window in a clip
```python
nemo_curator.utils.windowing_utils.FPS = 2.0
```
```python
nemo_curator.utils.windowing_utils.FPS_MAX_FRAMES = 768
```
```python
nemo_curator.utils.windowing_utils.FPS_MIN_FRAMES = 4
```
```python
nemo_curator.utils.windowing_utils.FRAME_FACTOR = 2
```
```python
nemo_curator.utils.windowing_utils.IMAGE_FACTOR = 28
```
```python
nemo_curator.utils.windowing_utils.MAX_PIXELS = 16384 * 28 * 28
```
```python
nemo_curator.utils.windowing_utils.MAX_RATIO = 200
```
```python
nemo_curator.utils.windowing_utils.MIN_PIXELS = 4 * 28 * 28
```
```python
nemo_curator.utils.windowing_utils.OPENAI_CLIP_MEAN = [0.48145466, 0.4578275, 0.40821073]
```
```python
nemo_curator.utils.windowing_utils.OPENAI_CLIP_STD = [0.26862954, 0.26130258, 0.27577711]
```
```python
nemo_curator.utils.windowing_utils.VIDEO_MAX_PIXELS = 768 * 28 * 28
```
```python
nemo_curator.utils.windowing_utils.VIDEO_MIN_PIXELS = 128 * 28 * 28
```
```python
nemo_curator.utils.windowing_utils.VIDEO_TOTAL_PIXELS = 24576 * 28 * 28
```
```python
nemo_curator.utils.windowing_utils.WINDOW_MIN_FRAMES = 4
```