nemo_automodel.components.datasets.vlm.pp_media#
Module Contents#
Functions#
Split VLM pixel values and media metadata into PP microbatch chunks. |
|
Chunk Step3-style image tensors for PP microbatches. |
|
Move VLM media tensors into pre-chunked PP media storage on the batch. |
|
Wrap a VLM collate function so it prepares media tensors for PP. |
|
Attach dataloader-prepared VLM media chunks to PP stage 0 for one schedule call. |
Data#
API#
- nemo_automodel.components.datasets.vlm.pp_media.VLM_PP_MEDIA_KEY#
‘_vlm_pp_media_chunks’
- nemo_automodel.components.datasets.vlm.pp_media._VLM_MEDIA_KEYS#
(‘pixel_values’, ‘patch_pixel_values’, ‘num_patches’, ‘patch_newline_mask’, ‘image_grid_hws’, ‘image…
- nemo_automodel.components.datasets.vlm.pp_media.chunk_vlm_media(
- pixel_values: torch.Tensor,
- image_grid: torch.Tensor,
- batch_size: int,
- n_microbatches: int,
- n_images_per_sample: torch.Tensor | None = None,
Split VLM pixel values and media metadata into PP microbatch chunks.
Handles four layouts:
[N, C, H, W]withN == batch_size– one full image per sample.[N, max_patches, D]withN == batch_size– padded patches per image.Flat patches
[total_patches, D]with per-sample media counts fromn_images_per_sample.Flat patches with
n_images == batch_size– legacy one-image-per-sample.
- nemo_automodel.components.datasets.vlm.pp_media.chunk_step3_media(
- pixel_values: torch.Tensor,
- *,
- batch_size: int,
- n_microbatches: int,
- num_patches: torch.Tensor | None = None,
- patch_pixel_values: torch.Tensor | None = None,
- patch_newline_mask: torch.Tensor | None = None,
Chunk Step3-style image tensors for PP microbatches.
Step3 processors emit one full image per sample in
pixel_valuesand a flat list of optional crop patches inpatch_pixel_values.num_patchesmaps samples to the flat patch tensor.
- nemo_automodel.components.datasets.vlm.pp_media._select_image_grid(
- image_grid_hws: torch.Tensor | None,
- image_grid_thw: torch.Tensor | None,
- image_sizes: torch.Tensor | None,
- image_position_ids: torch.Tensor | None,
- nemo_automodel.components.datasets.vlm.pp_media.prepare_vlm_media_for_pp(
- batch: collections.abc.MutableMapping[str, Any],
- *,
- batch_size: int,
- n_microbatches: int,
Move VLM media tensors into pre-chunked PP media storage on the batch.
This is intended to run from VLM collate/dataloader code when PP is enabled. The returned batch no longer carries raw media tensors that PyTorch PP would chunk by row incorrectly; instead it carries
VLM_PP_MEDIA_KEYwith per-microbatch media chunks.
- nemo_automodel.components.datasets.vlm.pp_media.wrap_vlm_collate_for_pp(
- collate_fn: collections.abc.Callable[[Any], collections.abc.MutableMapping[str, Any]],
- *,
- n_microbatches: int,
Wrap a VLM collate function so it prepares media tensors for PP.
- nemo_automodel.components.datasets.vlm.pp_media.stage_vlm_media_for_pp(
- pp: Any,
- model_parts: list[torch.nn.Module],
- batch: collections.abc.MutableMapping[str, Any],
Attach dataloader-prepared VLM media chunks to PP stage 0 for one schedule call.
- nemo_automodel.components.datasets.vlm.pp_media.__all__#
[‘VLM_PP_MEDIA_KEY’, ‘chunk_vlm_media’, ‘prepare_vlm_media_for_pp’, ‘stage_vlm_media_for_pp’, ‘wrap_…