bridge.data.vlm_datasets.conversation_dataset
#
Core dataset types for conversation-style VLM examples.
Module Contents#
Classes#
Repeating wrapper over a list of HF-style conversation examples. |
API#
- class bridge.data.vlm_datasets.conversation_dataset.VLMConversationDataset(
- base_examples: List[Dict[str, Any]],
- target_length: int,
- processor: Any,
- collate_impl: Optional[Callable[[list, Any], Dict[str, torch.Tensor]]] = None,
Bases:
torch.utils.data.Dataset
Repeating wrapper over a list of HF-style conversation examples.
Each base example is expected to contain a “conversation” key following processor.apply_chat_template conventions. Optional modality fields like “audio” are passed through and consumed by the collate function.
Dataset length is set to a target length and indexes wrap around the underlying list to meet the requested size.
A
collate_fn
attribute is exposed so the framework can pass it to the DataLoader.
Initialization
- __len__() int #
- __getitem__(idx: int) Dict[str, Any] #