> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.multimodal.utils

Data utilities: tokenization, patchify, position IDs, attention masks.

## Module Contents

### Functions

| Name                                                                                                                                    | Description                                                                   |
| --------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| [`add_special_tokens`](#nemo_automodel-components-datasets-multimodal-utils-add_special_tokens)                                         | Add BAGEL's four special tokens to a Qwen2 tokenizer and return the ids.      |
| [`get_flattened_position_ids_extrapolate`](#nemo_automodel-components-datasets-multimodal-utils-get_flattened_position_ids_extrapolate) | Return flattened 2D patch position IDs by direct grid extrapolation.          |
| [`get_flattened_position_ids_interpolate`](#nemo_automodel-components-datasets-multimodal-utils-get_flattened_position_ids_interpolate) | Return flattened 2D patch position IDs by interpolating to the max grid.      |
| [`len2weight`](#nemo_automodel-components-datasets-multimodal-utils-len2weight)                                                         | Convert a sequence length into BAGEL's per-sample loss weight.                |
| [`patchify`](#nemo_automodel-components-datasets-multimodal-utils-patchify)                                                             | Patchify a CxHxW image tensor into (H/p \* W/p, p*p*C).                       |
| [`pil_img2rgb`](#nemo_automodel-components-datasets-multimodal-utils-pil_img2rgb)                                                       | Convert a PIL image to RGB, compositing transparent pixels on white.          |
| [`prepare_attention_mask_per_sample`](#nemo_automodel-components-datasets-multimodal-utils-prepare_attention_mask_per_sample)           | Build a per-sample additive float mask honoring causal / full / noise splits. |
| [`split_integer_exp_decay`](#nemo_automodel-components-datasets-multimodal-utils-split_integer_exp_decay)                               | Split an integer into random positive chunks with optional exponential decay. |

### API

```python
nemo_automodel.components.datasets.multimodal.utils.add_special_tokens(
    tokenizer
)
```

Add BAGEL's four special tokens to a Qwen2 tokenizer and return the ids.

**Returns:**

tuple `(tokenizer, new_token_ids, num_new_tokens)` where

```python
nemo_automodel.components.datasets.multimodal.utils.get_flattened_position_ids_extrapolate(
    img_h,
    img_w,
    patch_size,
    max_num_patches_per_side
)
```

Return flattened 2D patch position IDs by direct grid extrapolation.

```python
nemo_automodel.components.datasets.multimodal.utils.get_flattened_position_ids_interpolate(
    img_h,
    img_w,
    patch_size,
    max_num_patches_per_side
)
```

Return flattened 2D patch position IDs by interpolating to the max grid.

```python
nemo_automodel.components.datasets.multimodal.utils.len2weight(
    x,
    loss_reduction = 'square'
)
```

Convert a sequence length into BAGEL's per-sample loss weight.

```python
nemo_automodel.components.datasets.multimodal.utils.patchify(
    image,
    patch_size
)
```

Patchify a CxHxW image tensor into (H/p \* W/p, p*p*C).

```python
nemo_automodel.components.datasets.multimodal.utils.pil_img2rgb(
    image
)
```

Convert a PIL image to RGB, compositing transparent pixels on white.

```python
nemo_automodel.components.datasets.multimodal.utils.prepare_attention_mask_per_sample(
    split_lens,
    attn_modes,
    device = 'cpu'
)
```

Build a per-sample additive float mask honoring causal / full / noise splits.

```python
nemo_automodel.components.datasets.multimodal.utils.split_integer_exp_decay(
    S,
    ng_sample_decay = 1.0
)
```

Split an integer into random positive chunks with optional exponential decay.