nemo_automodel.datasets.vlm.utils#

Module Contents#

Functions#

extract_skipped_token_ids

Returns list of tokens to mask in labels.

json2token

Convert an ordered JSON object into a token sequence.

process_text_batch

Process a batch of texts and optionally images.

Data#

API#

nemo_automodel.datasets.vlm.utils.QWEN_TOKENS#

[β€˜<|im_start|>’, β€˜<|im_end|>’, β€˜<|vision_start|>’, β€˜<|vision_end|>’, β€˜<|vision_pad|>’, β€˜<|image_pad|…

nemo_automodel.datasets.vlm.utils.LLAVA_TOKENS#

[’’, β€˜β€™]

nemo_automodel.datasets.vlm.utils.LLAMA_TOKENS#

[β€˜<|begin_of_text|>’, β€˜<|end_of_text|>’, β€˜<|finetune_right_pad_id|>’, β€˜<|step_id|>’, β€˜<|start_header…

nemo_automodel.datasets.vlm.utils.GEMMA_TOKENS#

[β€˜<image_soft_token>’]

nemo_automodel.datasets.vlm.utils.PAD_TOKENS#

β€˜set(…)’

nemo_automodel.datasets.vlm.utils.extract_skipped_token_ids(processor)[source]#

Returns list of tokens to mask in labels.

Extracted from NeMo’s HFAutoModelForImageTextToText.extract_skipped_token_ids

nemo_automodel.datasets.vlm.utils.json2token(obj, sort_json_key: bool = True)[source]#

Convert an ordered JSON object into a token sequence.

From NeMo’s automodel_datasets.py

nemo_automodel.datasets.vlm.utils.process_text_batch(
processor,
texts: list[str],
images: list | None = None,
) dict[str, torch.Tensor][source]#

Process a batch of texts and optionally images.

Parameters:
  • processor – The processor to use for tokenization and image processing

  • texts – List of text strings to process

  • images – Optional list of images to process

Returns:

Dict containing processed batch data