nemo_automodel.components.datasets.multimodal.distributed_iterable
nemo_automodel.components.datasets.multimodal.distributed_iterable
DistributedIterableDataset base for BAGEL-style data pipelines.
Module Contents
Classes
Data
API
Bases: IterableDataset
Base class for rank/worker-aware iterable datasets.
Owns a private rng used only to shuffle file paths deterministically
in :meth:set_epoch — NOT used for per-sample randomness. Per-sample
randomness still goes through the Python global random module (see
:mod:packing for the reseed hook).
_drop_counters
data_paths_per_rank
num_files_per_rank
rng