`nemo_automodel.components.datasets.vlm.datasets`#

Module Contents#

Functions#

`make_rdr_dataset`	Load and preprocess the RDR dataset for image-to-text fine-tuning.
`make_cord_v2_dataset`	Load and preprocess the CORD-V2 dataset for image-to-text fine-tuning.
`make_medpix_dataset`	Load and preprocess the MedPix dataset for image-to-text fine-tuning.
`make_cv17_dataset`	Load and preprocess the CommonVoice 17 dataset for audio-to-text fine-tuning.
`make_unimm_chat_dataset`	Load and preprocess the UniMM-Chat dataset for image-to-text fine-tuning.

API#

nemo_automodel.components.datasets.vlm.datasets.make_rdr_dataset(

path_or_dataset='quintend/rdr-items',

split='train',

**kwargs,

)#

Load and preprocess the RDR dataset for image-to-text fine-tuning.

Parameters:

path_or_dataset (str) – Path or identifier for the RDR dataset.
split (str) – Dataset split to load.
**kwargs – Additional arguments.

Returns:

The processed dataset.

Return type:

Dataset

nemo_automodel.components.datasets.vlm.datasets.make_cord_v2_dataset(

path_or_dataset='naver-clova-ix/cord-v2',

split='train',

**kwargs,

)#: Load and preprocess the CORD-V2 dataset for image-to-text fine-tuning.

nemo_automodel.components.datasets.vlm.datasets.make_medpix_dataset(

path_or_dataset='medpix-dataset/medpix-dataset',

split='train',

**kwargs,

)#: Load and preprocess the MedPix dataset for image-to-text fine-tuning.

nemo_automodel.components.datasets.vlm.datasets.make_cv17_dataset(

path_or_dataset='ysdede/commonvoice_17_tr_fixed',

split='train',

**kwargs,

)#: Load and preprocess the CommonVoice 17 dataset for audio-to-text fine-tuning.

nemo_automodel.components.datasets.vlm.datasets.make_unimm_chat_dataset(

path_or_dataset='Yirany/UniMM-Chat',

split='train',

**kwargs,

)#: Load and preprocess the UniMM-Chat dataset for image-to-text fine-tuning.