nemo_automodel.components.datasets.vlm.utils
#
Module Contents#
Functions#
Returns list of tokens to mask in labels. |
|
Convert an ordered JSON object into a token sequence. |
|
Process a batch of texts and optionally images. |
Data#
API#
- nemo_automodel.components.datasets.vlm.utils.QWEN_TOKENS#
[‘<|im_start|>’, ‘<|im_end|>’, ‘<|vision_start|>’, ‘<|vision_end|>’, ‘<|vision_pad|>’, ‘<|image_pad|…
- nemo_automodel.components.datasets.vlm.utils.LLAVA_TOKENS#
[’
’, ‘ ’]
- nemo_automodel.components.datasets.vlm.utils.LLAMA_TOKENS#
[‘<|begin_of_text|>’, ‘<|end_of_text|>’, ‘<|finetune_right_pad_id|>’, ‘<|step_id|>’, ‘<|start_header…
- nemo_automodel.components.datasets.vlm.utils.GEMMA_TOKENS#
[‘<image_soft_token>’]
- nemo_automodel.components.datasets.vlm.utils.GEMMA_3N_TOKENS#
[‘<image_soft_token>’, ‘<audio_soft_token>’, ‘<start_of_audio>’, ‘<start_of_image>’, ‘<end_of_audio>…
- nemo_automodel.components.datasets.vlm.utils.PAD_TOKENS#
‘set(…)’
- nemo_automodel.components.datasets.vlm.utils.extract_skipped_token_ids(processor)[source]#
Returns list of tokens to mask in labels.
Extracted from NeMo’s HFAutoModelForImageTextToText.extract_skipped_token_ids
- nemo_automodel.components.datasets.vlm.utils.json2token(obj, sort_json_key: bool = True)[source]#
Convert an ordered JSON object into a token sequence.
From NeMo’s automodel_datasets.py
- nemo_automodel.components.datasets.vlm.utils.process_text_batch(
- processor,
- texts: list[str],
- images: list | None = None,
Process a batch of texts and optionally images.
- Parameters:
processor – The processor to use for tokenization and image processing
texts – List of text strings to process
images – Optional list of images to process
- Returns:
Dict containing processed batch data