core.datasets.multimodal_dataset#

Module Contents#

Classes#

MultimodalDatasetConfig

Configuration object for Megatron Core Multimodal datasets.

MockMultimodalDataset

Mock multimodal dataset.

API#

class core.datasets.multimodal_dataset.MultimodalDatasetConfig#

Bases: megatron.core.datasets.gpt_dataset.GPTDatasetConfig

Configuration object for Megatron Core Multimodal datasets.

Note: This is unused at the moment and may be missing features. Follow-up changes will use this.

image_h: int#

None

Image height.

image_w: int#

None

Image width.

preprocess_func: Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]#

None

Optional function to preprocess data samples for a specific model.

__post_init__() None#
class core.datasets.multimodal_dataset.MockMultimodalDataset(
dataset: megatron.core.datasets.gpt_dataset.MockGPTLowLevelDataset,
dataset_path: Optional[str],
indices: numpy.ndarray,
num_samples: int,
index_split: megatron.core.datasets.utils.Split,
config: megatron.core.datasets.gpt_dataset.GPTDatasetConfig,
)#

Bases: megatron.core.datasets.gpt_dataset.MockGPTDataset

Mock multimodal dataset.

This is unused at the moment and may be missing features. Follow-up changes will use this.

Initialization

__getitem__(idx: int) Dict[str, torch.Tensor]#

Return a sample that contains a dummy image, text sequence and the associated labels and cost and attention masks.

Parameters:

idx (int) – The integer seed for mock data generation.

Returns:

The mock data.

Return type:

Dict[str, torch.Tensor]