nemo_automodel.components.datasets.vlm.utils#
Module Contents#
Functions#
Convert an ordered JSON object into a token sequence. |
|
Process a batch of texts and optionally images. |
API#
- nemo_automodel.components.datasets.vlm.utils.default_stop_tokens(processor) Iterable[str]#
- nemo_automodel.components.datasets.vlm.utils.json2token(obj, sort_json_key: bool = True)#
Convert an ordered JSON object into a token sequence.
From NeMo’s automodel_datasets.py
- nemo_automodel.components.datasets.vlm.utils.process_text_batch(
- processor,
- texts: list[str],
- images: list | None = None,
Process a batch of texts and optionally images.
- Parameters:
processor – The processor to use for tokenization and image processing
texts – List of text strings to process
images – Optional list of images to process
- Returns:
Dict containing processed batch data