nemo_automodel.components.datasets.multimodal.transforms

Image transforms for BAGEL’s NaViT-style aspect-ratio-aware resize.

Module Contents

Classes

Name	Description
`ImageTransform`	Full BAGEL image transform: resize + to_tensor + normalize.
`MaxLongEdgeMinShortEdgeResize`	Resize so longest/shortest edges stay within bounds and both edges are stride-divisible.

Functions

Name	Description
`_require_torchvision`	-

API

class nemo_automodel.components.datasets.multimodal.transforms.ImageTransform(
    max_image_size,
    min_image_size,
    image_stride,
    max_pixels = 14 * 14 * 9 * 1024,
    image_mean = (0.5, 0.5, 0.5),
    image_std = (0.5, 0.5, 0.5)
)

Full BAGEL image transform: resize + to_tensor + normalize.

Used for both ViT input (stride=14) and VAE input (stride=16, via separate instances). stride is exposed as an attribute so the dataset can compute patch counts without knowing the transform class.

normalize_transform

resize_transform

to_tensor_transform

= tv_transforms.ToTensor()

nemo_automodel.components.datasets.multimodal.transforms.ImageTransform.__call__(
    img,
    img_num = 1
)

class nemo_automodel.components.datasets.multimodal.transforms.MaxLongEdgeMinShortEdgeResize(
    max_size: int,
    min_size: int,
    stride: int,
    max_pixels: int,
    interpolation = None,
    antialias = True
)

Bases: Module

Resize so longest/shortest edges stay within bounds and both edges are stride-divisible.

Parameters:

max_size

int

Maximum size for the longest edge.

min_size

int

Minimum size for the shortest edge.

stride

int

Value both edges must be divisible by (ViT patch size).

max_pixels

int

Maximum total pixels for the full image.

interpolation

Defaults to None

Torchvision interpolation mode (default bicubic).

antialias

Defaults to True

Whether to apply antialiasing.

nemo_automodel.components.datasets.multimodal.transforms.MaxLongEdgeMinShortEdgeResize._apply_scale(
    width,
    height,
    scale
)

nemo_automodel.components.datasets.multimodal.transforms.MaxLongEdgeMinShortEdgeResize._make_divisible(
    value,
    stride
)

nemo_automodel.components.datasets.multimodal.transforms.MaxLongEdgeMinShortEdgeResize.forward(
    img,
    img_num = 1
)

nemo_automodel.components.datasets.multimodal.transforms._require_torchvision()