`bridge.data.hf_datasets.conversation_dataset`#

Core dataset types for HF conversation-style examples.

Module Contents#

Classes#

ConversationDataset

Repeating wrapper over a list of HF-style conversation examples.

API#

class bridge.data.hf_datasets.conversation_dataset.ConversationDataset( base_examples: List[Dict[str, Any]], target_length: int, processor: Any, collate_impl: Optional[Callable[[list, Any], Dict[str, torch.Tensor]]] = None, pack_sequences: bool = False, pack_sequences_pad_to_multiple_of: int = 1, )#

Bases: torch.utils.data.Dataset

Repeating wrapper over a list of HF-style conversation examples.

Each base example is expected to contain a “conversation” key following processor.apply_chat_template conventions. Optional modality fields like “audio” are passed through and consumed by the collate function.
Dataset length is set to a target length and indexes wrap around the underlying list to meet the requested size.
A collate_fn attribute is exposed so the framework can pass it to the DataLoader.

Initialization

__len__() → int#

__getitem__(idx: int) → Dict[str, Any]#

bridge.data.hf_datasets.conversation_dataset#

Module Contents#

Classes#

API#

`bridge.data.hf_datasets.conversation_dataset`#