nemo_automodel.components.datasets.llm.megatron.sampler

View as Markdown

Module Contents

Classes

NameDescription
BaseMegatronSamplerBase class for Megatron batch samplers.
MegatronPretrainingRandomSamplerRandomized sampler with per-epoch shuffling and per-rank slicing.
MegatronPretrainingSamplerDeterministic sequential sampler with per-rank slicing.

Functions

NameDescription
create_megatron_samplerFactory for Megatron samplers.

API

class nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler(
total_samples: int,
micro_batch_size: int,
data_parallel_rank: int,
data_parallel_size: int,
drop_last: bool = True,
global_batch_size: typing.Optional[int] = None,
pad_samples_to_global_batch_size: typing.Optional[bool] = False
)

Base class for Megatron batch samplers.

Provides common validation and shared behavior for Megatron samplers. Implementations must yield lists of dataset indices that correspond to one micro-batch for a single data-parallel rank.

Parameters:

total_samples
int

Total available samples in the dataset.

micro_batch_size
int

Number of samples per micro-batch on each data-parallel rank.

data_parallel_rank
int

Rank id in the data-parallel group that this sampler will serve.

data_parallel_size
int

World size of the data-parallel group.

drop_last
boolDefaults to True

If True, drop incomplete batches. If False, implementations may yield a final partial micro-batch (subject to their constraints).

global_batch_size
Optional[int]Defaults to None

Effective global batch size across all data-parallel ranks; when provided, length is computed in global-batch units and converted to micro-batches.

pad_samples_to_global_batch_size
Optional[bool]Defaults to False

If True and supported by the sampler, the last incomplete global batch will be padded to global_batch_size when drop_last is False.

micro_batch_times_data_parallel_size
= self.micro_batch_size * data_parallel_size
nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler.__iter__()
abstract
nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler.__len__()

Return the number of micro-batches this sampler will yield.

If global_batch_size is provided, the length is computed in terms of global batches and converted to micro-batches to align with training loops that iterate by micro-batch.

class nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingRandomSampler(
total_samples: int,
micro_batch_size: int,
data_parallel_rank: int,
data_parallel_size: int,
drop_last: bool = True,
global_batch_size: typing.Optional[int] = None,
pad_samples_to_global_batch_size: typing.Optional[bool] = False,
seed: int = 0
)

Bases: BaseMegatronSampler

Randomized sampler with per-epoch shuffling and per-rank slicing.

Uses a deterministic seed schedule seed + epoch to randomize indices within each data-parallel shard (bucket). Notably, this sampler:

  • Does not support padding the last global batch.
  • Requires drop_last=True when the product micro_batch_size * data_parallel_size > 1.
consumed_samples
= 0
last_batch_size
nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingRandomSampler.__iter__()

Yield randomized micro-batches for this rank.

Each epoch shuffles indices within the per-rank bucket using torch.randperm seeded by seed + epoch. The sampler then emits contiguous micro-batches of size micro_batch_size for this rank.

nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingRandomSampler.__len__()

Return the number of micro-batches that will be produced.

Accounts for drop_last by excluding a trailing incomplete global batch. When global_batch_size is provided, converts global batches to micro-batches.

class nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingSampler(
total_samples: int,
micro_batch_size: int,
data_parallel_rank: int,
data_parallel_size: int,
drop_last: bool = True,
global_batch_size: typing.Optional[int] = None,
pad_samples_to_global_batch_size: typing.Optional[bool] = False
)

Bases: BaseMegatronSampler

Deterministic sequential sampler with per-rank slicing.

Iterates deterministically over sample indices, splits each global batch across data-parallel ranks, and yields per-rank micro-batches. When drop_last is False and pad_samples_to_global_batch_size is True, the final global batch is padded to a full size so that all ranks emit complete micro-batches.

Raises:

  • RuntimeError: If there are no samples left to consume.
nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingSampler.__iter__()

Yield lists of indices forming per-rank micro-batches.

Iterates up to total_samples. Optionally pads the last global batch when drop_last is False and pad_samples_to_global_batch_size is True.

nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingSampler.get_start_end_idx()

Return slice boundaries for this rank within a global batch.

Returns:

Tuple of (start_idx, end_idx) used to extract this rank’s

nemo_automodel.components.datasets.llm.megatron.sampler.create_megatron_sampler(
dataset_len: int,
micro_batch_size: int,
global_batch_size: int,
dataloader_type: typing.Literal['single', 'cyclic'] = 'single',
drop_last: bool = True,
pad_samples_to_global_batch_size: bool = False,
rank: int = 0,
world_size: int = 1
) -> nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler

Factory for Megatron samplers.

Constructs and returns a Megatron-compatible sampler for a dataset of a given length and batch configuration. The returned sampler yields lists of indices per micro-batch for a single data-parallel rank.

Parameters:

dataset_len
int

Number of samples in the underlying dataset.

micro_batch_size
int

Number of samples per micro-batch on each data-parallel rank.

global_batch_size
int

Effective global batch size across all data-parallel ranks (micro_batch_size * world_size * grad_accum).

dataloader_type
Literal['single', 'cyclic']Defaults to 'single'

Sampler type to construct. Supported values:

  • “single”: Deterministic sequential sampling (MegatronPretrainingSampler).
  • “cyclic”: Randomized per-epoch sampling (MegatronPretrainingRandomSampler). The value “batch” is not supported in this implementation.
drop_last
boolDefaults to True

When True, drop a trailing incomplete batch.

pad_samples_to_global_batch_size
boolDefaults to False

When True and supported by the sampler, pad the final global batch to global_batch_size if drop_last is False.

rank
intDefaults to 0

Data-parallel rank id for this process.

world_size
intDefaults to 1

Number of data-parallel ranks.

Returns: BaseMegatronSampler

Configured sampler instance for the requested type.

Raises:

  • Exception: If an unsupported dataloader_type is provided.