nemo_automodel.components.datasets.llm.megatron.sampler
nemo_automodel.components.datasets.llm.megatron.sampler
Module Contents
Classes
Functions
API
Base class for Megatron batch samplers.
Provides common validation and shared behavior for Megatron samplers. Implementations must yield lists of dataset indices that correspond to one micro-batch for a single data-parallel rank.
Parameters:
Total available samples in the dataset.
Number of samples per micro-batch on each data-parallel rank.
Rank id in the data-parallel group that this sampler will serve.
World size of the data-parallel group.
If True, drop incomplete batches. If False, implementations may yield a final partial micro-batch (subject to their constraints).
Effective global batch size across all data-parallel ranks; when provided, length is computed in global-batch units and converted to micro-batches.
If True and supported by the sampler,
the last incomplete global batch will be padded to global_batch_size
when drop_last is False.
Return the number of micro-batches this sampler will yield.
If global_batch_size is provided, the length is computed in terms of
global batches and converted to micro-batches to align with training
loops that iterate by micro-batch.
Bases: BaseMegatronSampler
Randomized sampler with per-epoch shuffling and per-rank slicing.
Uses a deterministic seed schedule seed + epoch to randomize indices
within each data-parallel shard (bucket). Notably, this sampler:
- Does not support padding the last global batch.
- Requires
drop_last=Truewhen the productmicro_batch_size * data_parallel_size > 1.
Yield randomized micro-batches for this rank.
Each epoch shuffles indices within the per-rank bucket using
torch.randperm seeded by seed + epoch. The sampler then emits
contiguous micro-batches of size micro_batch_size for this rank.
Return the number of micro-batches that will be produced.
Accounts for drop_last by excluding a trailing incomplete global batch.
When global_batch_size is provided, converts global batches to
micro-batches.
Bases: BaseMegatronSampler
Deterministic sequential sampler with per-rank slicing.
Iterates deterministically over sample indices, splits each global batch
across data-parallel ranks, and yields per-rank micro-batches. When
drop_last is False and pad_samples_to_global_batch_size is True, the
final global batch is padded to a full size so that all ranks emit complete
micro-batches.
Raises:
RuntimeError: If there are no samples left to consume.
Yield lists of indices forming per-rank micro-batches.
Iterates up to total_samples. Optionally pads
the last global batch when drop_last is False and
pad_samples_to_global_batch_size is True.
Return slice boundaries for this rank within a global batch.
Returns:
Tuple of (start_idx, end_idx) used to extract this rank’s
Factory for Megatron samplers.
Constructs and returns a Megatron-compatible sampler for a dataset of a given length and batch configuration. The returned sampler yields lists of indices per micro-batch for a single data-parallel rank.
Parameters:
Number of samples in the underlying dataset.
Number of samples per micro-batch on each data-parallel rank.
Effective global batch size across all
data-parallel ranks (micro_batch_size * world_size * grad_accum).
Sampler type to construct. Supported values:
- “single”: Deterministic sequential sampling
(
MegatronPretrainingSampler). - “cyclic”: Randomized per-epoch sampling
(
MegatronPretrainingRandomSampler). The value “batch” is not supported in this implementation.
When True, drop a trailing incomplete batch.
When True and supported by the sampler,
pad the final global batch to global_batch_size if drop_last is
False.
Data-parallel rank id for this process.
Number of data-parallel ranks.
Returns: BaseMegatronSampler
Configured sampler instance for the requested type.
Raises:
Exception: If an unsupporteddataloader_typeis provided.