nemo_automodel.components.datasets.vlm.pp_media#

Module Contents#

Functions#

chunk_vlm_media

Split VLM pixel values and media metadata into PP microbatch chunks.

chunk_step3_media

Chunk Step3-style image tensors for PP microbatches.

_select_image_grid

prepare_vlm_media_for_pp

Move VLM media tensors into pre-chunked PP media storage on the batch.

wrap_vlm_collate_for_pp

Wrap a VLM collate function so it prepares media tensors for PP.

stage_vlm_media_for_pp

Attach dataloader-prepared VLM media chunks to PP stage 0 for one schedule call.

Data#

API#

nemo_automodel.components.datasets.vlm.pp_media.VLM_PP_MEDIA_KEY#

‘_vlm_pp_media_chunks’

nemo_automodel.components.datasets.vlm.pp_media._VLM_MEDIA_KEYS#

(‘pixel_values’, ‘patch_pixel_values’, ‘num_patches’, ‘patch_newline_mask’, ‘image_grid_hws’, ‘image…

nemo_automodel.components.datasets.vlm.pp_media.chunk_vlm_media(
pixel_values: torch.Tensor,
image_grid: torch.Tensor,
batch_size: int,
n_microbatches: int,
n_images_per_sample: torch.Tensor | None = None,
) tuple[list[torch.Tensor], list[torch.Tensor]]#

Split VLM pixel values and media metadata into PP microbatch chunks.

Handles four layouts:

  1. [N, C, H, W] with N == batch_size – one full image per sample.

  2. [N, max_patches, D] with N == batch_size – padded patches per image.

  3. Flat patches [total_patches, D] with per-sample media counts from n_images_per_sample.

  4. Flat patches with n_images == batch_size – legacy one-image-per-sample.

nemo_automodel.components.datasets.vlm.pp_media.chunk_step3_media(
pixel_values: torch.Tensor,
*,
batch_size: int,
n_microbatches: int,
num_patches: torch.Tensor | None = None,
patch_pixel_values: torch.Tensor | None = None,
patch_newline_mask: torch.Tensor | None = None,
) dict[str, list[torch.Tensor]]#

Chunk Step3-style image tensors for PP microbatches.

Step3 processors emit one full image per sample in pixel_values and a flat list of optional crop patches in patch_pixel_values. num_patches maps samples to the flat patch tensor.

nemo_automodel.components.datasets.vlm.pp_media._select_image_grid(
image_grid_hws: torch.Tensor | None,
image_grid_thw: torch.Tensor | None,
image_sizes: torch.Tensor | None,
image_position_ids: torch.Tensor | None,
) torch.Tensor | None#
nemo_automodel.components.datasets.vlm.pp_media.prepare_vlm_media_for_pp(
batch: collections.abc.MutableMapping[str, Any],
*,
batch_size: int,
n_microbatches: int,
) collections.abc.MutableMapping[str, Any]#

Move VLM media tensors into pre-chunked PP media storage on the batch.

This is intended to run from VLM collate/dataloader code when PP is enabled. The returned batch no longer carries raw media tensors that PyTorch PP would chunk by row incorrectly; instead it carries VLM_PP_MEDIA_KEY with per-microbatch media chunks.

nemo_automodel.components.datasets.vlm.pp_media.wrap_vlm_collate_for_pp(
collate_fn: collections.abc.Callable[[Any], collections.abc.MutableMapping[str, Any]],
*,
n_microbatches: int,
) collections.abc.Callable[[Any], collections.abc.MutableMapping[str, Any]]#

Wrap a VLM collate function so it prepares media tensors for PP.

nemo_automodel.components.datasets.vlm.pp_media.stage_vlm_media_for_pp(
pp: Any,
model_parts: list[torch.nn.Module],
batch: collections.abc.MutableMapping[str, Any],
)#

Attach dataloader-prepared VLM media chunks to PP stage 0 for one schedule call.

nemo_automodel.components.datasets.vlm.pp_media.__all__#

[‘VLM_PP_MEDIA_KEY’, ‘chunk_vlm_media’, ‘prepare_vlm_media_for_pp’, ‘stage_vlm_media_for_pp’, ‘wrap_…