bridge.data.vlm_datasets.hf_dataset_makers
#
Built-in maker functions that transform HuggingFace datasets into conversation-style examples consumable by VLM processors.
Module Contents#
Functions#
Load and preprocess the RDR dataset for image-to-text fine-tuning. |
|
Load and preprocess the CORD-V2 dataset for image-to-text fine-tuning. |
|
Load and preprocess the MedPix dataset for image-to-text fine-tuning. |
|
Load and preprocess the CommonVoice 17 dataset for audio-to-text fine-tuning. |
API#
- bridge.data.vlm_datasets.hf_dataset_makers.make_rdr_dataset(
- path_or_dataset: str = 'quintend/rdr-items',
- split: str = 'train',
- **kwargs,
Load and preprocess the RDR dataset for image-to-text fine-tuning.
Returns a list of examples with a “conversation” field that includes an image and text.
- bridge.data.vlm_datasets.hf_dataset_makers.make_cord_v2_dataset(
- path_or_dataset: str = 'naver-clova-ix/cord-v2',
- split: str = 'train',
- **kwargs,
Load and preprocess the CORD-V2 dataset for image-to-text fine-tuning.
- bridge.data.vlm_datasets.hf_dataset_makers.make_medpix_dataset(
- path_or_dataset: str = 'mmoukouba/MedPix-VQA',
- split: str = 'train',
- **kwargs,
Load and preprocess the MedPix dataset for image-to-text fine-tuning.
- bridge.data.vlm_datasets.hf_dataset_makers.make_cv17_dataset(
- path_or_dataset: str = 'ysdede/commonvoice_17_tr_fixed',
- split: str = 'train',
- **kwargs,
Load and preprocess the CommonVoice 17 dataset for audio-to-text fine-tuning.