bridge.data.vlm_datasets.collate#

Collation utilities for building VLM training batches from conversation examples.

Module Contents#

Functions#

_gather_assistant_text_segments

Extract assistant text segments from the structured conversation example.

create_multiturn_loss_mask_by_search

Tokenizer-agnostic masking via substring search of assistant texts.

phi4_mm_collate_fn

Collate function for Phi-4 MM model audio input

qwen2_5_collate_fn

Collate function for Qwen2.5 VL model.

default_collate_fn

Default collate function for VLM models.

Data#

API#

bridge.data.vlm_datasets.collate.MISSING_QWEN_VL_UTILS_MSG#

‘qwen_vl_utils is required for Qwen2.5 VL processing. Please pip install qwen-vl-utils or provide c…’

bridge.data.vlm_datasets.collate._gather_assistant_text_segments(example: dict) list[str]#

Extract assistant text segments from the structured conversation example.

The example schema is expected to be {“conversation”: [{“role”: …, “content”: […]} …]} where content is a list of items like {“type”: “text”|”image”|…, “text”: “…”}. Returns a list of concatenated text strings, one per assistant turn.

Tokenizer-agnostic masking via substring search of assistant texts.

  • Tokenize full conversation with processor already done -> input_ids

  • Extract assistant text strings from the structured example

  • For each assistant text, tokenize without special tokens and search sequentially

  • On success, unmask that span; otherwise leave masked

bridge.data.vlm_datasets.collate.phi4_mm_collate_fn(examples, processor)#

Collate function for Phi-4 MM model audio input

bridge.data.vlm_datasets.collate.qwen2_5_collate_fn(
examples: list,
processor,
) dict[str, torch.Tensor]#

Collate function for Qwen2.5 VL model.

bridge.data.vlm_datasets.collate.default_collate_fn(
examples: list,
processor,
) dict[str, torch.Tensor]#

Default collate function for VLM models.

bridge.data.vlm_datasets.collate.COLLATE_FNS#

None