bridge.data.vlm_datasets.conversation_dataset#

Core dataset types for conversation-style VLM examples.

Module Contents#

Classes#

VLMConversationDataset

Repeating wrapper over a list of HF-style conversation examples.

API#

class bridge.data.vlm_datasets.conversation_dataset.VLMConversationDataset(
base_examples: List[Dict[str, Any]],
target_length: int,
processor: Any,
collate_impl: Optional[Callable[[list, Any], Dict[str, torch.Tensor]]] = None,
)#

Bases: torch.utils.data.Dataset

Repeating wrapper over a list of HF-style conversation examples.

  • Each base example is expected to contain a “conversation” key following processor.apply_chat_template conventions. Optional modality fields like “audio” are passed through and consumed by the collate function.

  • Dataset length is set to a target length and indexes wrap around the underlying list to meet the requested size.

  • A collate_fn attribute is exposed so the framework can pass it to the DataLoader.

Initialization

__len__() int#
__getitem__(idx: int) Dict[str, Any]#