nemo_automodel.components.datasets.llm.megatron.sampler
#
Module Contents#
Classes#
Base class for Megatron batch samplers. |
|
Deterministic sequential sampler with per-rank slicing. |
|
Randomized sampler with per-epoch shuffling and per-rank slicing. |
Functions#
Factory for Megatron samplers. |
API#
- class nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler(
- total_samples: int,
- micro_batch_size: int,
- data_parallel_rank: int,
- data_parallel_size: int,
- drop_last: bool = True,
- global_batch_size: Optional[int] = None,
- pad_samples_to_global_batch_size: Optional[bool] = False,
Base class for Megatron batch samplers.
Provides common validation and shared behavior for Megatron samplers. Implementations must yield lists of dataset indices that correspond to one micro-batch for a single data-parallel rank.
- Parameters:
total_samples β Total available samples in the dataset.
micro_batch_size β Number of samples per micro-batch on each data-parallel rank.
data_parallel_rank β Rank id in the data-parallel group that this sampler will serve.
data_parallel_size β World size of the data-parallel group.
drop_last β If True, drop incomplete batches. If False, implementations may yield a final partial micro-batch (subject to their constraints).
global_batch_size β Effective global batch size across all data-parallel ranks; when provided, length is computed in global-batch units and converted to micro-batches.
pad_samples_to_global_batch_size β If True and supported by the sampler, the last incomplete global batch will be padded to
global_batch_size
whendrop_last
is False.
Initialization
- __len__()#
Return the number of micro-batches this sampler will yield.
If
global_batch_size
is provided, the length is computed in terms of global batches and converted to micro-batches to align with training loops that iterate by micro-batch.
- abstract __iter__()#
- class nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingSampler(
- total_samples: int,
- micro_batch_size: int,
- data_parallel_rank: int,
- data_parallel_size: int,
- drop_last: bool = True,
- global_batch_size: Optional[int] = None,
- pad_samples_to_global_batch_size: Optional[bool] = False,
Bases:
nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler
Deterministic sequential sampler with per-rank slicing.
Iterates deterministically over sample indices, splits each global batch across data-parallel ranks, and yields per-rank micro-batches. When
drop_last
is False andpad_samples_to_global_batch_size
is True, the final global batch is padded to a full size so that all ranks emit complete micro-batches.- Raises:
RuntimeError β If there are no samples left to consume.
Initialization
- get_start_end_idx()#
Return slice boundaries for this rank within a global batch.
- Returns:
Tuple of
(start_idx, end_idx)
used to extract this rankβs micro-batch from a concatenated global batch buffer.
- __iter__()#
Yield lists of indices forming per-rank micro-batches.
Iterates up to
total_samples
. Optionally pads the last global batch whendrop_last
is False andpad_samples_to_global_batch_size
is True.
- class nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingRandomSampler(
- total_samples: int,
- micro_batch_size: int,
- data_parallel_rank: int,
- data_parallel_size: int,
- drop_last: bool = True,
- global_batch_size: Optional[int] = None,
- pad_samples_to_global_batch_size: Optional[bool] = False,
- seed: int = 0,
Bases:
nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler
Randomized sampler with per-epoch shuffling and per-rank slicing.
Uses a deterministic seed schedule
seed + epoch
to randomize indices within each data-parallel shard (bucket). Notably, this sampler:Does not support padding the last global batch.
Requires
drop_last=True
when the productmicro_batch_size * data_parallel_size > 1
.
Initialization
- __len__()#
Return the number of micro-batches that will be produced.
Accounts for
drop_last
by excluding a trailing incomplete global batch. Whenglobal_batch_size
is provided, converts global batches to micro-batches.
- __iter__()#
Yield randomized micro-batches for this rank.
Each epoch shuffles indices within the per-rank bucket using
torch.randperm
seeded byseed + epoch
. The sampler then emits contiguous micro-batches of sizemicro_batch_size
for this rank.
- nemo_automodel.components.datasets.llm.megatron.sampler.create_megatron_sampler(
- dataset_len: int,
- micro_batch_size: int,
- global_batch_size: int,
- dataloader_type: Literal[single, cyclic] = 'single',
- drop_last: bool = True,
- pad_samples_to_global_batch_size: bool = False,
- rank: int = 0,
- world_size: int = 1,
Factory for Megatron samplers.
Constructs and returns a Megatron-compatible sampler for a dataset of a given length and batch configuration. The returned sampler yields lists of indices per micro-batch for a single data-parallel rank.
- Parameters:
dataset_len β Number of samples in the underlying dataset.
micro_batch_size β Number of samples per micro-batch on each data-parallel rank.
global_batch_size β Effective global batch size across all data-parallel ranks (
micro_batch_size * world_size * grad_accum
).dataloader_type β
Sampler type to construct. Supported values:
βsingleβ: Deterministic sequential sampling (
MegatronPretrainingSampler
).βcyclicβ: Randomized per-epoch sampling (
MegatronPretrainingRandomSampler
). The value βbatchβ is not supported in this implementation.
drop_last β When True, drop a trailing incomplete batch.
pad_samples_to_global_batch_size β When True and supported by the sampler, pad the final global batch to
global_batch_size
ifdrop_last
is False.rank β Data-parallel rank id for this process.
world_size β Number of data-parallel ranks.
- Returns:
Configured sampler instance for the requested type.
- Return type:
- Raises:
Exception β If an unsupported
dataloader_type
is provided.