nemo_automodel.components.datasets.vlm.pp_media
nemo_automodel.components.datasets.vlm.pp_media
Module Contents
Functions
Data
API
Chunk Step3-style image tensors for PP microbatches.
Step3 processors emit one full image per sample in pixel_values and a
flat list of optional crop patches in patch_pixel_values. num_patches
maps samples to the flat patch tensor.
Split VLM pixel values and media metadata into PP microbatch chunks.
Handles four layouts:
[N, C, H, W]withN == batch_size— one full image per sample.[N, max_patches, D]withN == batch_size— padded patches per image.- Flat patches
[total_patches, D]with per-sample media counts fromn_images_per_sample. - Flat patches with
n_images == batch_size— legacy one-image-per-sample.
Move VLM media tensors into pre-chunked PP media storage on the batch.
This is intended to run from VLM collate/dataloader code when PP is enabled.
The returned batch no longer carries raw media tensors that PyTorch PP would
chunk by row incorrectly; instead it carries VLM_PP_MEDIA_KEY with
per-microbatch media chunks.
Attach dataloader-prepared VLM media chunks to PP stage 0 for one schedule call.
Wrap a VLM collate function so it prepares media tensors for PP.