bridge.data.mimo.loaders#
Data loader utilities for MIMO training.
Module Contents#
Functions#
Build MIMO data loaders with per-module DP settings. |
API#
- bridge.data.mimo.loaders.build_mimo_data_loaders(
- cfg: megatron.bridge.training.config.ConfigContainer,
- train_state: megatron.bridge.training.state.TrainState,
- mimo_provider: megatron.bridge.training.config.DatasetProvider,
- train_samples: int,
- valid_samples: int,
- test_samples: int,
Build MIMO data loaders with per-module DP settings.
Creates data loaders with DP-aware sampling based on the MIMO parallelism configuration. Only ranks that need data (first/last PP stage) will get non-None loaders.
- Parameters:
cfg – Configuration container with MimoModelProvider as cfg.model.
train_state – Current training state.
mimo_provider – MIMO dataset provider (e.g., MockMimoProvider) with get_collate_fn() method.
train_samples – Number of training samples.
valid_samples – Number of validation samples.
test_samples – Number of test samples.
- Returns:
Tuple of (train_loader, valid_loader, test_loader). Returns (None, None, None) if this rank doesn’t need data.
- Raises:
ValueError – If cfg.model is not MimoModelProvider or mimo_parallelism_config is None.
.. rubric:: Example
from megatron.bridge.data.mimo import MockMimoProvider, build_mimo_data_loaders provider = MockMimoProvider( … seq_length=2048, … processor_paths={“vision”: “openai/clip-vit-large-patch14”}, … tokenizer_path=”meta-llama/Llama-2-7b-hf”, … special_token_ids={“vision”: 32000}, … modality_configs={“vision”: {“type”: “image”, “width”: 224, “height”: 224}}, … ) train_loader, valid_loader, test_loader = build_mimo_data_loaders( … cfg, train_state, provider, … train_samples=10000, valid_samples=1000, test_samples=1000, … )