`bridge.data.vlm_datasets.conversation_dataset`#

Core dataset types for conversation-style VLM examples.

Module Contents#

Classes#

VLMConversationDataset

Repeating wrapper over a list of HF-style conversation examples.

API#

class bridge.data.vlm_datasets.conversation_dataset.VLMConversationDataset( base_examples: List[Dict[str, Any]], target_length: int, processor: Any, collate_impl: Optional[Callable[[list, Any], Dict[str, torch.Tensor]]] = None, )#

Bases: torch.utils.data.Dataset

Repeating wrapper over a list of HF-style conversation examples.

Each base example is expected to contain a “conversation” key following processor.apply_chat_template conventions. Optional modality fields like “audio” are passed through and consumed by the collate function.
Dataset length is set to a target length and indexes wrap around the underlying list to meet the requested size.
A collate_fn attribute is exposed so the framework can pass it to the DataLoader.

Initialization

__len__() → int#

__getitem__(idx: int) → Dict[str, Any]#

bridge.data.vlm_datasets.conversation_dataset#

Module Contents#

Classes#

API#

`bridge.data.vlm_datasets.conversation_dataset`#