`nemo_automodel.datasets.vlm.utils`#

Module Contents#

Functions#

`extract_skipped_token_ids`	Returns list of tokens to mask in labels.
`json2token`	Convert an ordered JSON object into a token sequence.
`process_text_batch`	Process a batch of texts and optionally images.

Data#

`QWEN_TOKENS`
`LLAVA_TOKENS`
`LLAMA_TOKENS`
`GEMMA_TOKENS`
`PAD_TOKENS`

API#

nemo_automodel.datasets.vlm.utils.QWEN_TOKENS#: [‘<|im_start|>’, ‘<|im_end|>’, ‘<|vision_start|>’, ‘<|vision_end|>’, ‘<|vision_pad|>’, ‘<|image_pad|…

nemo_automodel.datasets.vlm.utils.LLAVA_TOKENS#: [’’, ‘’]

nemo_automodel.datasets.vlm.utils.LLAMA_TOKENS#: [‘<|begin_of_text|>’, ‘<|end_of_text|>’, ‘<|finetune_right_pad_id|>’, ‘<|step_id|>’, ‘<|start_header…

nemo_automodel.datasets.vlm.utils.GEMMA_TOKENS#: [‘<image_soft_token>’]

nemo_automodel.datasets.vlm.utils.PAD_TOKENS#: ‘set(…)’

nemo_automodel.datasets.vlm.utils.extract_skipped_token_ids(processor)[source]#

Returns list of tokens to mask in labels.

Extracted from NeMo’s HFAutoModelForImageTextToText.extract_skipped_token_ids

nemo_automodel.datasets.vlm.utils.json2token(obj, sort_json_key: bool = True)[source]#

Convert an ordered JSON object into a token sequence.

From NeMo’s automodel_datasets.py

nemo_automodel.datasets.vlm.utils.process_text_batch( processor, texts: list[str], images: list | None = None, ) → dict[str, torch.Tensor][source]#

Process a batch of texts and optionally images.

Parameters:

processor – The processor to use for tokenization and image processing
texts – List of text strings to process
images – Optional list of images to process

Returns:

Dict containing processed batch data

nemo_automodel.datasets.vlm.utils#

Module Contents#

Functions#

Data#

API#

`nemo_automodel.datasets.vlm.utils`#