nemo_automodel.components.datasets.multimodal.utils

View as Markdown

Data utilities: tokenization, patchify, position IDs, attention masks.

Module Contents

Functions

NameDescription
add_special_tokensAdd BAGEL’s four special tokens to a Qwen2 tokenizer and return the ids.
get_flattened_position_ids_extrapolateReturn flattened 2D patch position IDs by direct grid extrapolation.
get_flattened_position_ids_interpolateReturn flattened 2D patch position IDs by interpolating to the max grid.
len2weightConvert a sequence length into BAGEL’s per-sample loss weight.
patchifyPatchify a CxHxW image tensor into (H/p * W/p, ppC).
pil_img2rgbConvert a PIL image to RGB, compositing transparent pixels on white.
prepare_attention_mask_per_sampleBuild a per-sample additive float mask honoring causal / full / noise splits.
split_integer_exp_decaySplit an integer into random positive chunks with optional exponential decay.

API

nemo_automodel.components.datasets.multimodal.utils.add_special_tokens(
tokenizer
)

Add BAGEL’s four special tokens to a Qwen2 tokenizer and return the ids.

Returns:

tuple (tokenizer, new_token_ids, num_new_tokens) where

nemo_automodel.components.datasets.multimodal.utils.get_flattened_position_ids_extrapolate(
img_h,
img_w,
patch_size,
max_num_patches_per_side
)

Return flattened 2D patch position IDs by direct grid extrapolation.

nemo_automodel.components.datasets.multimodal.utils.get_flattened_position_ids_interpolate(
img_h,
img_w,
patch_size,
max_num_patches_per_side
)

Return flattened 2D patch position IDs by interpolating to the max grid.

nemo_automodel.components.datasets.multimodal.utils.len2weight(
x,
loss_reduction = 'square'
)

Convert a sequence length into BAGEL’s per-sample loss weight.

nemo_automodel.components.datasets.multimodal.utils.patchify(
image,
patch_size
)

Patchify a CxHxW image tensor into (H/p * W/p, ppC).

nemo_automodel.components.datasets.multimodal.utils.pil_img2rgb(
image
)

Convert a PIL image to RGB, compositing transparent pixels on white.

nemo_automodel.components.datasets.multimodal.utils.prepare_attention_mask_per_sample(
split_lens,
attn_modes,
device = 'cpu'
)

Build a per-sample additive float mask honoring causal / full / noise splits.

nemo_automodel.components.datasets.multimodal.utils.split_integer_exp_decay(
S,
ng_sample_decay = 1.0
)

Split an integer into random positive chunks with optional exponential decay.