nemo_automodel.components.datasets.vlm.mock#
Mock VLM conversation dataset for benchmarking and testing.
Generates synthetic image(s) and minimal conversations in the standard
Automodel conversation format, compatible with PreTokenizedDatasetWrapper
and any HF AutoProcessor that supports the conversation schema.
The images are random-noise PIL images — no real data download is needed. The processor / vision encoder processes them through the normal pipeline, so this exercises the full VLM training path end-to-end.
When used with pretokenize: true, truncate: true, and max_length
in the dataset config, PreTokenizedDatasetWrapper tokenizes each sample
and truncates to exactly max_length tokens. The mock response is
sized from max_length so that truncation always produces a full-length
sequence.
Module Contents#
Functions#
Create a random-noise RGB PIL image. |
|
Generate a dummy response of num_words words from a fixed pool. |
|
Build a mock VLM dataset in Automodel conversation format. |
Data#
API#
- nemo_automodel.components.datasets.vlm.mock._WORD_POOL#
‘split(…)’
- nemo_automodel.components.datasets.vlm.mock._make_random_image(
- rng: numpy.random.Generator,
- size: Tuple[int, int] = (256, 256),
Create a random-noise RGB PIL image.
- nemo_automodel.components.datasets.vlm.mock._generate_response(rng: numpy.random.Generator, num_words: int) str#
Generate a dummy response of num_words words from a fixed pool.
- nemo_automodel.components.datasets.vlm.mock.build_mock_vlm_dataset(
- *,
- num_samples: int = 10,
- num_images_per_sample: int = 1,
- image_size: Tuple[int, int] = (256, 256),
- prompt: str = 'Describe this image.',
- responses: Optional[List[str]] = None,
- max_length: Optional[int] = None,
- seed: int = 0,
- **kwargs,
Build a mock VLM dataset in Automodel conversation format.
Each sample is a dict with a
"conversation"key whose value is a list of user/assistant message dicts. User messages contain one or more{"type": "image", "image": <PIL.Image>}items followed by a text prompt. Assistant messages contain a single text response.This is the same format produced by
make_rdr_dataset,make_unimm_chat_dataset, andmake_meta_dataset, so the returned list can be fed directly toPreTokenizedDatasetWrapper.When
max_lengthis set andresponsesisNone, each sample’s assistant response is generated withmax_lengthwords — guaranteed to exceedmax_lengthtokens so thatPreTokenizedDatasetWrapperwithtruncate=Trueproduces exactlymax_lengthtokens per sample.- Parameters:
num_samples – Number of conversation examples to generate.
num_images_per_sample – Number of random images per user turn.
image_size –
(width, height)of each generated image.prompt – Text prompt appended after the image(s) in the user turn.
responses – Optional list of assistant responses. Cycled over samples.
max_length – Target sequence length. When set (and
responsesisNone), generates a response ofmax_lengthwords per sample so the tokenized sequence always exceedsmax_lengthtokens.seed – Random seed for reproducibility.
- Returns:
A list of dicts, each with a single
"conversation"key.