`bridge.data.vlm_datasets.step37_flickr8k.packed_dataloader`#

Synchronous MixedPackedDataloader.

Instead of being a stateful __next__ iterator, this exposes __len__

__getitem__(idx) so it plugs into mbridge’s MegatronPretrainingSampler + standard PyTorch DataLoader flow.

The internal schedule (sample order + non-truncation packing) is computed once at __init__ from fixed seeds, so the contents of pack idx are deterministic. Per-step ordering across the train loop may still differ because mbridge’s sampler shuffles pack indices independently — but each individual pack is reproducible.

Module Contents#

Classes#

MixedPackedDataloader

Map-style packed dataset.

API#

class bridge.data.vlm_datasets.step37_flickr8k.packed_dataloader.MixedPackedDataloader( datasets: list, epochs: list[float], max_length: int, oversize_policy: Literal[drop, extend] = 'extend', transform: Optional[collections.abc.Callable] = None, dataset_sampling: Union[Literal[sequential, random], list[Literal[sequential, random]]] = 'random', )#

Bases: torch.utils.data.Dataset

Map-style packed dataset.

Returns a fully assembled packed sample (already passed through transform) for each index. Used by

Class:: Step37Flickr8kSFTDataProvider to feed mbridge’s standard MegatronPretrainingSampler + DataLoader.

Initialization

static _normalize_dataset_sampling( dataset_sampling: Union[Literal[sequential, random], list[Literal[sequential, random]]], num_datasets: int, ) → list[Literal[sequential, random]]#

static _build_in_domain_sampler( sampling_strategy: Literal[sequential, random], size: int, idx: int, ) → Union[megatron.bridge.data.vlm_datasets.step37_flickr8k.samplers.LoopedShuffleSampler, megatron.bridge.data.vlm_datasets.step37_flickr8k.samplers.LoopedSequentialSampler]#

_schedule_all( max_length: int, oversize_policy: str = 'drop', ) → tuple[list[tuple[int, int]], megatron.bridge.data.vlm_datasets.step37_flickr8k.packing.PackingResult]#

__len__() → int#

__getitem__(idx: int) → Any#

Assemble the pack at index idx without using a mutable internal cursor.

Returns the same result for a given idx on every call: the precomputed in-domain order selects the same samples, which are then run through transform.

bridge.data.vlm_datasets.step37_flickr8k.packed_dataloader#

Module Contents#

Classes#

API#

`bridge.data.vlm_datasets.step37_flickr8k.packed_dataloader`#