> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.multimodal.transforms

Image transforms for BAGEL's NaViT-style aspect-ratio-aware resize.

## Module Contents

### Classes

| Name                                                                                                                       | Description                                                                              |
| -------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| [`ImageTransform`](#nemo_automodel-components-datasets-multimodal-transforms-ImageTransform)                               | Full BAGEL image transform: resize + to\_tensor + normalize.                             |
| [`MaxLongEdgeMinShortEdgeResize`](#nemo_automodel-components-datasets-multimodal-transforms-MaxLongEdgeMinShortEdgeResize) | Resize so longest/shortest edges stay within bounds and both edges are stride-divisible. |

### Functions

| Name                                                                                                     | Description |
| -------------------------------------------------------------------------------------------------------- | ----------- |
| [`_require_torchvision`](#nemo_automodel-components-datasets-multimodal-transforms-_require_torchvision) | -           |

### API

```python
class nemo_automodel.components.datasets.multimodal.transforms.ImageTransform(
    max_image_size,
    min_image_size,
    image_stride,
    max_pixels = 14 * 14 * 9 * 1024,
    image_mean = (0.5, 0.5, 0.5),
    image_std = (0.5, 0.5, 0.5)
)
```

Full BAGEL image transform: resize + to\_tensor + normalize.

Used for both ViT input (stride=14) and VAE input (stride=16, via
separate instances). `stride` is exposed as an attribute so the
dataset can compute patch counts without knowing the transform class.

```python
nemo_automodel.components.datasets.multimodal.transforms.ImageTransform.__call__(
    img,
    img_num = 1
)
```

```python
class nemo_automodel.components.datasets.multimodal.transforms.MaxLongEdgeMinShortEdgeResize(
    max_size: int,
    min_size: int,
    stride: int,
    max_pixels: int,
    interpolation = None,
    antialias = True
)
```

**Bases:** `Module`

Resize so longest/shortest edges stay within bounds and both edges are stride-divisible.

**Parameters:**

Maximum size for the longest edge.

Minimum size for the shortest edge.

Value both edges must be divisible by (ViT patch size).

Maximum total pixels for the full image.

Torchvision interpolation mode (default bicubic).

Whether to apply antialiasing.

```python
nemo_automodel.components.datasets.multimodal.transforms.MaxLongEdgeMinShortEdgeResize._apply_scale(
    width,
    height,
    scale
)
```

```python
nemo_automodel.components.datasets.multimodal.transforms.MaxLongEdgeMinShortEdgeResize._make_divisible(
    value,
    stride
)
```

```python
nemo_automodel.components.datasets.multimodal.transforms.MaxLongEdgeMinShortEdgeResize.forward(
    img,
    img_num = 1
)
```

```python
nemo_automodel.components.datasets.multimodal.transforms._require_torchvision()
```