> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.vlm.pp_media

## Module Contents

### Functions

| Name                                                                                                    | Description                                                                      |
| ------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
| [`_select_image_grid`](#nemo_automodel-components-datasets-vlm-pp_media-_select_image_grid)             | -                                                                                |
| [`chunk_step3_media`](#nemo_automodel-components-datasets-vlm-pp_media-chunk_step3_media)               | Chunk Step3-style image tensors for PP microbatches.                             |
| [`chunk_vlm_media`](#nemo_automodel-components-datasets-vlm-pp_media-chunk_vlm_media)                   | Split VLM pixel values and media metadata into PP microbatch chunks.             |
| [`prepare_vlm_media_for_pp`](#nemo_automodel-components-datasets-vlm-pp_media-prepare_vlm_media_for_pp) | Move VLM media tensors into pre-chunked PP media storage on the batch.           |
| [`stage_vlm_media_for_pp`](#nemo_automodel-components-datasets-vlm-pp_media-stage_vlm_media_for_pp)     | Attach dataloader-prepared VLM media chunks to PP stage 0 for one schedule call. |
| [`wrap_vlm_collate_for_pp`](#nemo_automodel-components-datasets-vlm-pp_media-wrap_vlm_collate_for_pp)   | Wrap a VLM collate function so it prepares media tensors for PP.                 |

### Data

[`VLM_PP_MEDIA_KEY`](#nemo_automodel-components-datasets-vlm-pp_media-VLM_PP_MEDIA_KEY)

[`_VLM_MEDIA_KEYS`](#nemo_automodel-components-datasets-vlm-pp_media-_VLM_MEDIA_KEYS)

[`__all__`](#nemo_automodel-components-datasets-vlm-pp_media-__all__)

### API

```python
nemo_automodel.components.datasets.vlm.pp_media._select_image_grid(
    image_grid_hws: torch.Tensor | None,
    image_grid_thw: torch.Tensor | None,
    image_sizes: torch.Tensor | None,
    image_position_ids: torch.Tensor | None
) -> torch.Tensor | None
```

```python
nemo_automodel.components.datasets.vlm.pp_media.chunk_step3_media(
    pixel_values: torch.Tensor,
    batch_size: int,
    n_microbatches: int,
    num_patches: torch.Tensor | None = None,
    patch_pixel_values: torch.Tensor | None = None,
    patch_newline_mask: torch.Tensor | None = None
) -> dict[str, list[torch.Tensor]]
```

Chunk Step3-style image tensors for PP microbatches.

Step3 processors emit one full image per sample in `pixel_values` and a
flat list of optional crop patches in `patch_pixel_values`. `num_patches`
maps samples to the flat patch tensor.

```python
nemo_automodel.components.datasets.vlm.pp_media.chunk_vlm_media(
    pixel_values: torch.Tensor,
    image_grid: torch.Tensor,
    batch_size: int,
    n_microbatches: int,
    n_images_per_sample: torch.Tensor | None = None
) -> tuple[list[torch.Tensor], list[torch.Tensor]]
```

Split VLM pixel values and media metadata into PP microbatch chunks.

Handles four layouts:

1. `[N, C, H, W]` with `N == batch_size` -- one full image per sample.
2. `[N, max_patches, D]` with `N == batch_size` -- padded patches per image.
3. Flat patches `[total_patches, D]` with per-sample media counts from
   `n_images_per_sample`.
4. Flat patches with `n_images == batch_size` -- legacy one-image-per-sample.

```python
nemo_automodel.components.datasets.vlm.pp_media.prepare_vlm_media_for_pp(
    batch: collections.abc.MutableMapping[str, typing.Any],
    batch_size: int,
    n_microbatches: int
) -> collections.abc.MutableMapping[str, typing.Any]
```

Move VLM media tensors into pre-chunked PP media storage on the batch.

This is intended to run from VLM collate/dataloader code when PP is enabled.
The returned batch no longer carries raw media tensors that PyTorch PP would
chunk by row incorrectly; instead it carries `VLM_PP_MEDIA_KEY` with
per-microbatch media chunks.

```python
nemo_automodel.components.datasets.vlm.pp_media.stage_vlm_media_for_pp(
    pp: typing.Any,
    model_parts: list[torch.nn.Module],
    batch: collections.abc.MutableMapping[str, typing.Any]
)
```

Attach dataloader-prepared VLM media chunks to PP stage 0 for one schedule call.

```python
nemo_automodel.components.datasets.vlm.pp_media.wrap_vlm_collate_for_pp(
    collate_fn: collections.abc.Callable[[Any], collections.abc.MutableMapping[str, typing.Any]],
    n_microbatches: int
) -> collections.abc.Callable[[Any], collections.abc.MutableMapping[str, typing.Any]]
```

Wrap a VLM collate function so it prepares media tensors for PP.

```python
nemo_automodel.components.datasets.vlm.pp_media.VLM_PP_MEDIA_KEY = '_vlm_pp_media_chunks'
```

```python
nemo_automodel.components.datasets.vlm.pp_media._VLM_MEDIA_KEYS = ('pixel_values', 'patch_pixel_values', 'num_patches', 'patch_newline_mask', 'ima...
```

```python
nemo_automodel.components.datasets.vlm.pp_media.__all__ = ['VLM_PP_MEDIA_KEY', 'chunk_vlm_media', 'prepare_vlm_media_for_pp', 'stage_vlm_m...
```