> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.vlm.mock

Mock VLM conversation dataset for benchmarking and testing.

Generates synthetic image(s) and minimal conversations in the standard
Automodel conversation format, compatible with `PreTokenizedDatasetWrapper`
and any HF `AutoProcessor` that supports the conversation schema.

The images are random-noise PIL images — no real data download is needed.
The processor / vision encoder processes them through the normal pipeline,
so this exercises the full VLM training path end-to-end.

When used with `pretokenize: true`, `truncate: true`, and `max_length`
in the dataset config, `PreTokenizedDatasetWrapper` tokenizes each sample
and truncates to exactly `max_length` tokens.  The mock response is
sized from `max_length` so that truncation always produces a full-length
sequence.

## Module Contents

### Functions

| Name                                                                                            | Description                                                        |
| ----------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ |
| [`_generate_response`](#nemo_automodel-components-datasets-vlm-mock-_generate_response)         | Generate a dummy response of *num\_words* words from a fixed pool. |
| [`_make_random_image`](#nemo_automodel-components-datasets-vlm-mock-_make_random_image)         | Create a random-noise RGB PIL image.                               |
| [`build_mock_vlm_dataset`](#nemo_automodel-components-datasets-vlm-mock-build_mock_vlm_dataset) | Build a mock VLM dataset in Automodel conversation format.         |

### Data

[`_WORD_POOL`](#nemo_automodel-components-datasets-vlm-mock-_WORD_POOL)

### API

```python
nemo_automodel.components.datasets.vlm.mock._generate_response(
    rng: numpy.random.Generator,
    num_words: int
) -> str
```

Generate a dummy response of *num\_words* words from a fixed pool.

```python
nemo_automodel.components.datasets.vlm.mock._make_random_image(
    rng: numpy.random.Generator,
    size: typing.Tuple[int, int] = (256, 256)
) -> PIL.Image.Image
```

Create a random-noise RGB PIL image.

```python
nemo_automodel.components.datasets.vlm.mock.build_mock_vlm_dataset(
    num_samples: int = 10,
    num_images_per_sample: int = 1,
    image_size: typing.Tuple[int, int] = (256, 256),
    prompt: str = 'Describe this image.',
    responses: typing.Optional[typing.List[str]] = None,
    max_length: typing.Optional[int] = None,
    seed: int = 0,
    kwargs = {}
) -> list
```

Build a mock VLM dataset in Automodel conversation format.

Each sample is a dict with a `"conversation"` key whose value is a list
of user/assistant message dicts.  User messages contain one or more
`&#123;"type": "image", "image": &lt;PIL.Image&gt;&#125;` items followed by a text prompt.
Assistant messages contain a single text response.

This is the same format produced by `make_rdr_dataset`,
`make_unimm_chat_dataset`, and `make_meta_dataset`, so the returned
list can be fed directly to `PreTokenizedDatasetWrapper`.

When `max_length` is set and `responses` is `None`, each sample's
assistant response is generated with `max_length` words — guaranteed
to exceed `max_length` tokens so that `PreTokenizedDatasetWrapper`
with `truncate=True` produces exactly `max_length` tokens per sample.

**Parameters:**

Number of conversation examples to generate.

Number of random images per user turn.

`(width, height)` of each generated image.

Text prompt appended after the image(s) in the user turn.

Optional list of assistant responses.  Cycled over samples.

Target sequence length.  When set (and `responses` is
`None`), generates a response of `max_length` words per sample
so the tokenized sequence always exceeds `max_length` tokens.

Random seed for reproducibility.

**Returns:** `list`

A list of dicts, each with a single `"conversation"` key.

```python
nemo_automodel.components.datasets.vlm.mock._WORD_POOL = 'the image shows a landscape with mountains and rivers flowing through green val...
```