nemo_automodel.components.datasets.multimodal.transforms
nemo_automodel.components.datasets.multimodal.transforms
Image transforms for BAGEL’s NaViT-style aspect-ratio-aware resize.
Module Contents
Classes
Functions
API
Full BAGEL image transform: resize + to_tensor + normalize.
Used for both ViT input (stride=14) and VAE input (stride=16, via
separate instances). stride is exposed as an attribute so the
dataset can compute patch counts without knowing the transform class.
normalize_transform
resize_transform
to_tensor_transform
Bases: Module
Resize so longest/shortest edges stay within bounds and both edges are stride-divisible.
Parameters:
max_size
Maximum size for the longest edge.
min_size
Minimum size for the shortest edge.
stride
Value both edges must be divisible by (ViT patch size).
max_pixels
Maximum total pixels for the full image.
interpolation
Torchvision interpolation mode (default bicubic).
antialias
Whether to apply antialiasing.