bridge.data.vlm_datasets.hf_dataset_makers#

Built-in maker functions that transform HuggingFace datasets into conversation-style examples consumable by VLM processors.

Module Contents#

Functions#

make_rdr_dataset

Load and preprocess the RDR dataset for image-to-text fine-tuning.

make_cord_v2_dataset

Load and preprocess the CORD-V2 dataset for image-to-text fine-tuning.

make_medpix_dataset

Load and preprocess the MedPix dataset for image-to-text fine-tuning.

make_cv17_dataset

Load and preprocess the CommonVoice 17 dataset for audio-to-text fine-tuning.

API#

bridge.data.vlm_datasets.hf_dataset_makers.make_rdr_dataset(
path_or_dataset: str = 'quintend/rdr-items',
split: str = 'train',
**kwargs,
) List[Dict[str, Any]]#

Load and preprocess the RDR dataset for image-to-text fine-tuning.

Returns a list of examples with a “conversation” field that includes an image and text.

bridge.data.vlm_datasets.hf_dataset_makers.make_cord_v2_dataset(
path_or_dataset: str = 'naver-clova-ix/cord-v2',
split: str = 'train',
**kwargs,
) List[Dict[str, Any]]#

Load and preprocess the CORD-V2 dataset for image-to-text fine-tuning.

bridge.data.vlm_datasets.hf_dataset_makers.make_medpix_dataset(
path_or_dataset: str = 'mmoukouba/MedPix-VQA',
split: str = 'train',
**kwargs,
) List[Dict[str, Any]]#

Load and preprocess the MedPix dataset for image-to-text fine-tuning.

bridge.data.vlm_datasets.hf_dataset_makers.make_cv17_dataset(
path_or_dataset: str = 'ysdede/commonvoice_17_tr_fixed',
split: str = 'train',
**kwargs,
) List[Dict[str, Any]]#

Load and preprocess the CommonVoice 17 dataset for audio-to-text fine-tuning.