`nemo_automodel.components.datasets.vlm.pp_media`#

Module Contents#

Functions#

`chunk_vlm_media`	Split VLM pixel values and media metadata into PP microbatch chunks.
`chunk_step3_media`	Chunk Step3-style image tensors for PP microbatches.
`_select_image_grid`
`prepare_vlm_media_for_pp`	Move VLM media tensors into pre-chunked PP media storage on the batch.
`wrap_vlm_collate_for_pp`	Wrap a VLM collate function so it prepares media tensors for PP.
`stage_vlm_media_for_pp`	Attach dataloader-prepared VLM media chunks to PP stage 0 for one schedule call.

Data#

`VLM_PP_MEDIA_KEY`
`_VLM_MEDIA_KEYS`
`__all__`

API#

nemo_automodel.components.datasets.vlm.pp_media.VLM_PP_MEDIA_KEY#: ‘_vlm_pp_media_chunks’

nemo_automodel.components.datasets.vlm.pp_media._VLM_MEDIA_KEYS#: (‘pixel_values’, ‘patch_pixel_values’, ‘num_patches’, ‘patch_newline_mask’, ‘image_grid_hws’, ‘image…

nemo_automodel.components.datasets.vlm.pp_media.chunk_vlm_media( pixel_values: torch.Tensor, image_grid: torch.Tensor, batch_size: int, n_microbatches: int, n_images_per_sample: torch.Tensor | None = None, ) → tuple[list[torch.Tensor], list[torch.Tensor]]#

Split VLM pixel values and media metadata into PP microbatch chunks.

Handles four layouts:

[N, C, H, W] with N == batch_size – one full image per sample.
[N, max_patches, D] with N == batch_size – padded patches per image.
Flat patches [total_patches, D] with per-sample media counts from n_images_per_sample.
Flat patches with n_images == batch_size – legacy one-image-per-sample.

nemo_automodel.components.datasets.vlm.pp_media.chunk_step3_media( pixel_values: torch.Tensor, *, batch_size: int, n_microbatches: int, num_patches: torch.Tensor | None = None, patch_pixel_values: torch.Tensor | None = None, patch_newline_mask: torch.Tensor | None = None, ) → dict[str, list[torch.Tensor]]#

Chunk Step3-style image tensors for PP microbatches.

Step3 processors emit one full image per sample in pixel_values and a flat list of optional crop patches in patch_pixel_values. num_patches maps samples to the flat patch tensor.

nemo_automodel.components.datasets.vlm.pp_media._select_image_grid( image_grid_hws: torch.Tensor | None, image_grid_thw: torch.Tensor | None, image_sizes: torch.Tensor | None, image_position_ids: torch.Tensor | None, ) → torch.Tensor | None#

nemo_automodel.components.datasets.vlm.pp_media.prepare_vlm_media_for_pp( batch: collections.abc.MutableMapping[str, Any], *, batch_size: int, n_microbatches: int, ) → collections.abc.MutableMapping[str, Any]#

Move VLM media tensors into pre-chunked PP media storage on the batch.

This is intended to run from VLM collate/dataloader code when PP is enabled. The returned batch no longer carries raw media tensors that PyTorch PP would chunk by row incorrectly; instead it carries VLM_PP_MEDIA_KEY with per-microbatch media chunks.

nemo_automodel.components.datasets.vlm.pp_media.wrap_vlm_collate_for_pp( collate_fn: collections.abc.Callable[[Any], collections.abc.MutableMapping[str, Any]], *, n_microbatches: int, ) → collections.abc.Callable[[Any], collections.abc.MutableMapping[str, Any]]#: Wrap a VLM collate function so it prepares media tensors for PP.

nemo_automodel.components.datasets.vlm.pp_media.stage_vlm_media_for_pp( pp: Any, model_parts: list[torch.nn.Module], batch: collections.abc.MutableMapping[str, Any], )#: Attach dataloader-prepared VLM media chunks to PP stage 0 for one schedule call.

nemo_automodel.components.datasets.vlm.pp_media.__all__#: [‘VLM_PP_MEDIA_KEY’, ‘chunk_vlm_media’, ‘prepare_vlm_media_for_pp’, ‘stage_vlm_media_for_pp’, ‘wrap_…

nemo_automodel.components.datasets.vlm.pp_media#

Module Contents#

Functions#

Data#

API#

`nemo_automodel.components.datasets.vlm.pp_media`#