*** layout: overview slug: nemo-curator/nemo\_curator/utils/windowing\_utils title: nemo\_curator.utils.windowing\_utils ------------------------------------------- ## Module Contents ### Classes | Name | Description | | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------- | | [`WindowFrameInfo`](#nemo_curator-utils-windowing_utils-WindowFrameInfo) | Container for frame window information, storing start and end frame indices. | ### Functions | Name | Description | | ------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------- | | [`ceil_by_factor`](#nemo_curator-utils-windowing_utils-ceil_by_factor) | Return the smallest integer greater than or equal to 'number' that is divisible by 'factor'. | | [`compute_windows`](#nemo_curator-utils-windowing_utils-compute_windows) | Generate windows by splitting the video into segments of the specified size. | | [`fetch_video`](#nemo_curator-utils-windowing_utils-fetch_video) | Load and preprocess video frames from a file. | | [`floor_by_factor`](#nemo_curator-utils-windowing_utils-floor_by_factor) | Return the largest integer less than or equal to 'number' that is divisible by 'factor'. | | [`read_video_cpu`](#nemo_curator-utils-windowing_utils-read_video_cpu) | Read video using PyAv. | | [`round_by_factor`](#nemo_curator-utils-windowing_utils-round_by_factor) | Return the closest integer to 'number' that is divisible by 'factor'. | | [`smart_nframes`](#nemo_curator-utils-windowing_utils-smart_nframes) | Calculate the number of frames for video used for model inputs. | | [`smart_resize`](#nemo_curator-utils-windowing_utils-smart_resize) | Rescales the image so that the following conditions are met. | | [`split_video_into_windows`](#nemo_curator-utils-windowing_utils-split_video_into_windows) | Calculate windows and return video inputs for language model from input clips. | ### Data [`FPS`](#nemo_curator-utils-windowing_utils-FPS) [`FPS_MAX_FRAMES`](#nemo_curator-utils-windowing_utils-FPS_MAX_FRAMES) [`FPS_MIN_FRAMES`](#nemo_curator-utils-windowing_utils-FPS_MIN_FRAMES) [`FRAME_FACTOR`](#nemo_curator-utils-windowing_utils-FRAME_FACTOR) [`IMAGE_FACTOR`](#nemo_curator-utils-windowing_utils-IMAGE_FACTOR) [`MAX_PIXELS`](#nemo_curator-utils-windowing_utils-MAX_PIXELS) [`MAX_RATIO`](#nemo_curator-utils-windowing_utils-MAX_RATIO) [`MIN_PIXELS`](#nemo_curator-utils-windowing_utils-MIN_PIXELS) [`OPENAI_CLIP_MEAN`](#nemo_curator-utils-windowing_utils-OPENAI_CLIP_MEAN) [`OPENAI_CLIP_STD`](#nemo_curator-utils-windowing_utils-OPENAI_CLIP_STD) [`VIDEO_MAX_PIXELS`](#nemo_curator-utils-windowing_utils-VIDEO_MAX_PIXELS) [`VIDEO_MIN_PIXELS`](#nemo_curator-utils-windowing_utils-VIDEO_MIN_PIXELS) [`VIDEO_TOTAL_PIXELS`](#nemo_curator-utils-windowing_utils-VIDEO_TOTAL_PIXELS) [`WINDOW_MIN_FRAMES`](#nemo_curator-utils-windowing_utils-WINDOW_MIN_FRAMES) ### API ```python class nemo_curator.utils.windowing_utils.WindowFrameInfo( start: int, end: int ) ``` Dataclass Container for frame window information, storing start and end frame indices. This class represents a window of frames in a video, defined by its start and end frame positions. ```python nemo_curator.utils.windowing_utils.ceil_by_factor( number: float, factor: int ) -> int ``` Return the smallest integer greater than or equal to 'number' that is divisible by 'factor'. ```python nemo_curator.utils.windowing_utils.compute_windows( total_frames: int, window_size: int = 128, remainder_threshold: int = 64 ) -> list[nemo_curator.utils.windowing_utils.WindowFrameInfo] ``` Generate windows by splitting the video into segments of the specified size. **Parameters:** total frames The size of each window in number of frames. The minimum number of frames required to create a new window from the remainder. ```python nemo_curator.utils.windowing_utils.fetch_video( video_path: str, sampling_fps: float = 2.0, window_range: list[nemo_curator.utils.windowing_utils.WindowFrameInfo] | None = None, do_preprocess: bool = False, preprocess_dtype: str = 'float32', num_frames_to_use: int = 0, flip_input: bool = False ) -> tuple[torch.Tensor, list[int]] ``` Load and preprocess video frames from a file. **Parameters:** Path to the video file. Target frames per second for sampling. List of frame windows to extract. Whether to preprocess the frames. Data type for preprocessing. Number of frames to extract (0 for all). Whether to flip frames horizontally. **Returns:** `tuple[torch.Tensor, list[int]]` Tuple of (processed frames tensor, frame indices). ```python nemo_curator.utils.windowing_utils.floor_by_factor( number: float, factor: int ) -> int ``` Return the largest integer less than or equal to 'number' that is divisible by 'factor'. ```python nemo_curator.utils.windowing_utils.read_video_cpu( video_path: str, fps: float, num_frames_to_use: int, window_range: list[nemo_curator.utils.windowing_utils.WindowFrameInfo] ) -> tuple[torch.Tensor, list[int]] ``` Read video using PyAv. **Parameters:** path to the video support "file://", "http\://", "https\://" and local path. frames per second number of frames to use window range **Returns:** `tuple[torch.Tensor, list[int]]` torch.Tensor: the video tensor with shape (T, C, H, W). ```python nemo_curator.utils.windowing_utils.round_by_factor( number: float, factor: int ) -> int ``` Return the closest integer to 'number' that is divisible by 'factor'. ```python nemo_curator.utils.windowing_utils.smart_nframes( fps: float, total_frames: int, video_fps: float ) -> int ``` Calculate the number of frames for video used for model inputs. ```python nemo_curator.utils.windowing_utils.smart_resize( height: int, width: int, factor: int = IMAGE_FACTOR, min_pixels: int = MIN_PIXELS, max_pixels: int = MAX_PIXELS ) -> tuple[int, int] ``` Rescales the image so that the following conditions are met. 1. Both dimensions (height and width) are divisible by 'factor'. 2. The total number of pixels is within the range \['min\_pixels', 'max\_pixels']. 3. The aspect ratio of the image is maintained as closely as possible. ```python nemo_curator.utils.windowing_utils.split_video_into_windows( mp4_bytes: bytes, window_size: int = 256, remainder_threshold: int = 128, sampling_fps: float = 2.0, model_does_preprocess: bool = False, preprocess_dtype: str = 'uint8', flip_input: bool = False, num_frames_to_use: int = 0, return_bytes: bool = False, return_video_frames: bool = True, num_threads: int = 1 ) -> tuple[list[bytes], list[torch.Tensor | None], list[nemo_curator.utils.windowing_utils.WindowFrameInfo]] ``` Calculate windows and return video inputs for language model from input clips. Processes video to determine the windows for a clip, decode in one shot and return processed frames for each window in a format suitable for consumption by the Qwen model. **Parameters:** input video in bytes Frames per second of the input video. Data type to use for preprocessing the video/image inputs. Number of frames to extract from the video. If 0, uses all frames. Whether to flip the input video/image horizontally. Whether to extract mp4 bytes for each window for use by PreviewStage if the model does preprocessing number of threads threshold for remainder whether to return video frames sampling fps window size **Returns:** `tuple[list[bytes], list[torch.Tensor | None], list[WindowFrameInfo]]` Tuple containing: * "window\_mp4\_bytes": mp4 bytes corresponding to each window - only used when Preview stage is enabled * "window\_frames": Decoded and per-window processed frames ready for use by Qwen model * "window info": start and end frame indices for each window in a clip ```python nemo_curator.utils.windowing_utils.FPS = 2.0 ``` ```python nemo_curator.utils.windowing_utils.FPS_MAX_FRAMES = 768 ``` ```python nemo_curator.utils.windowing_utils.FPS_MIN_FRAMES = 4 ``` ```python nemo_curator.utils.windowing_utils.FRAME_FACTOR = 2 ``` ```python nemo_curator.utils.windowing_utils.IMAGE_FACTOR = 28 ``` ```python nemo_curator.utils.windowing_utils.MAX_PIXELS = 16384 * 28 * 28 ``` ```python nemo_curator.utils.windowing_utils.MAX_RATIO = 200 ``` ```python nemo_curator.utils.windowing_utils.MIN_PIXELS = 4 * 28 * 28 ``` ```python nemo_curator.utils.windowing_utils.OPENAI_CLIP_MEAN = [0.48145466, 0.4578275, 0.40821073] ``` ```python nemo_curator.utils.windowing_utils.OPENAI_CLIP_STD = [0.26862954, 0.26130258, 0.27577711] ``` ```python nemo_curator.utils.windowing_utils.VIDEO_MAX_PIXELS = 768 * 28 * 28 ``` ```python nemo_curator.utils.windowing_utils.VIDEO_MIN_PIXELS = 128 * 28 * 28 ``` ```python nemo_curator.utils.windowing_utils.VIDEO_TOTAL_PIXELS = 24576 * 28 * 28 ``` ```python nemo_curator.utils.windowing_utils.WINDOW_MIN_FRAMES = 4 ```