nemo_automodel.components.datasets.vlm.datasets#

Module Contents#

Functions#

make_rdr_dataset

Load and preprocess the RDR dataset for image-to-text fine-tuning.

make_cord_v2_dataset

Load and preprocess the CORD-V2 dataset for image-to-text fine-tuning.

make_medpix_dataset

Load and preprocess the MedPix dataset for image-to-text fine-tuning.

make_cv17_dataset

Load and preprocess the CommonVoice 17 dataset for audio-to-text fine-tuning.

API#

nemo_automodel.components.datasets.vlm.datasets.make_rdr_dataset(
path_or_dataset='quintend/rdr-items',
split='train',
**kwargs,
)[source]#

Load and preprocess the RDR dataset for image-to-text fine-tuning.

Parameters:
  • path_or_dataset (str) – Path or identifier for the RDR dataset.

  • split (str) – Dataset split to load.

  • **kwargs – Additional arguments.

Returns:

The processed dataset.

Return type:

Dataset

nemo_automodel.components.datasets.vlm.datasets.make_cord_v2_dataset(
path_or_dataset='naver-clova-ix/cord-v2',
split='train',
**kwargs,
)[source]#

Load and preprocess the CORD-V2 dataset for image-to-text fine-tuning.

nemo_automodel.components.datasets.vlm.datasets.make_medpix_dataset(
path_or_dataset='medpix-dataset/medpix-dataset',
split='train',
**kwargs,
)[source]#

Load and preprocess the MedPix dataset for image-to-text fine-tuning.

nemo_automodel.components.datasets.vlm.datasets.make_cv17_dataset(
path_or_dataset='ysdede/commonvoice_17_tr_fixed',
split='train',
**kwargs,
)[source]#

Load and preprocess the CommonVoice 17 dataset for audio-to-text fine-tuning.