nemo_automodel.components.datasets.llm.megatron.sampler

Module Contents

Classes

Name	Description
`BaseMegatronSampler`	Base class for Megatron batch samplers.
`MegatronPretrainingRandomSampler`	Randomized sampler with per-epoch shuffling and per-rank slicing.
`MegatronPretrainingSampler`	Deterministic sequential sampler with per-rank slicing.

Functions

Name	Description
`create_megatron_sampler`	Factory for Megatron samplers.

API

class nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler(
    total_samples: int,
    micro_batch_size: int,
    data_parallel_rank: int,
    data_parallel_size: int,
    drop_last: bool = True,
    global_batch_size: typing.Optional[int] = None,
    pad_samples_to_global_batch_size: typing.Optional[bool] = False
)

Base class for Megatron batch samplers.

Provides common validation and shared behavior for Megatron samplers. Implementations must yield lists of dataset indices that correspond to one micro-batch for a single data-parallel rank.

Parameters:

total_samples

int

Total available samples in the dataset.

micro_batch_size

int

Number of samples per micro-batch on each data-parallel rank.

data_parallel_rank

int

Rank id in the data-parallel group that this sampler will serve.

data_parallel_size

int

World size of the data-parallel group.

drop_last

boolDefaults to True

If True, drop incomplete batches. If False, implementations may yield a final partial micro-batch (subject to their constraints).

global_batch_size

Optional[int]Defaults to None

Effective global batch size across all data-parallel ranks; when provided, length is computed in global-batch units and converted to micro-batches.

pad_samples_to_global_batch_size

Optional[bool]Defaults to False

If True and supported by the sampler, the last incomplete global batch will be padded to global_batch_size when drop_last is False.

micro_batch_times_data_parallel_size

= self.micro_batch_size * data_parallel_size

nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler.__iter__()

abstract

nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler.__len__()

Return the number of micro-batches this sampler will yield.

If global_batch_size is provided, the length is computed in terms of global batches and converted to micro-batches to align with training loops that iterate by micro-batch.

class nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingRandomSampler(
    total_samples: int,
    micro_batch_size: int,
    data_parallel_rank: int,
    data_parallel_size: int,
    drop_last: bool = True,
    global_batch_size: typing.Optional[int] = None,
    pad_samples_to_global_batch_size: typing.Optional[bool] = False,
    seed: int = 0
)

Bases: BaseMegatronSampler

Randomized sampler with per-epoch shuffling and per-rank slicing.

Uses a deterministic seed schedule seed + epoch to randomize indices within each data-parallel shard (bucket). Notably, this sampler:

Does not support padding the last global batch.
Requires drop_last=True when the product micro_batch_size * data_parallel_size > 1.

consumed_samples

= 0

last_batch_size

nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingRandomSampler.__iter__()

Yield randomized micro-batches for this rank.

Each epoch shuffles indices within the per-rank bucket using torch.randperm seeded by seed + epoch. The sampler then emits contiguous micro-batches of size micro_batch_size for this rank.

nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingRandomSampler.__len__()

Return the number of micro-batches that will be produced.

Accounts for drop_last by excluding a trailing incomplete global batch. When global_batch_size is provided, converts global batches to micro-batches.

class nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingSampler(
    total_samples: int,
    micro_batch_size: int,
    data_parallel_rank: int,
    data_parallel_size: int,
    drop_last: bool = True,
    global_batch_size: typing.Optional[int] = None,
    pad_samples_to_global_batch_size: typing.Optional[bool] = False
)

Bases: BaseMegatronSampler

Deterministic sequential sampler with per-rank slicing.

Iterates deterministically over sample indices, splits each global batch across data-parallel ranks, and yields per-rank micro-batches. When drop_last is False and pad_samples_to_global_batch_size is True, the final global batch is padded to a full size so that all ranks emit complete micro-batches.

Raises:

RuntimeError: If there are no samples left to consume.

nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingSampler.__iter__()

Yield lists of indices forming per-rank micro-batches.

Iterates up to total_samples. Optionally pads the last global batch when drop_last is False and pad_samples_to_global_batch_size is True.

nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingSampler.get_start_end_idx()

Return slice boundaries for this rank within a global batch.

Returns:

Tuple of (start_idx, end_idx) used to extract this rank’s

nemo_automodel.components.datasets.llm.megatron.sampler.create_megatron_sampler(
    dataset_len: int,
    micro_batch_size: int,
    global_batch_size: int,
    dataloader_type: typing.Literal['single', 'cyclic'] = 'single',
    drop_last: bool = True,
    pad_samples_to_global_batch_size: bool = False,
    rank: int = 0,
    world_size: int = 1
) -> nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler

Factory for Megatron samplers.

Constructs and returns a Megatron-compatible sampler for a dataset of a given length and batch configuration. The returned sampler yields lists of indices per micro-batch for a single data-parallel rank.

Parameters:

dataset_len

int

Number of samples in the underlying dataset.

micro_batch_size

int

Number of samples per micro-batch on each data-parallel rank.

global_batch_size

int

Effective global batch size across all data-parallel ranks (micro_batch_size * world_size * grad_accum).

dataloader_type

Literal['single', 'cyclic']Defaults to 'single'

Sampler type to construct. Supported values:

“single”: Deterministic sequential sampling (MegatronPretrainingSampler).
“cyclic”: Randomized per-epoch sampling (MegatronPretrainingRandomSampler). The value “batch” is not supported in this implementation.

drop_last

boolDefaults to True

When True, drop a trailing incomplete batch.

pad_samples_to_global_batch_size

boolDefaults to False

When True and supported by the sampler, pad the final global batch to global_batch_size if drop_last is False.

rank

intDefaults to 0

Data-parallel rank id for this process.

world_size

intDefaults to 1

Number of data-parallel ranks.

Returns: BaseMegatronSampler

Configured sampler instance for the requested type.

Raises:

Exception: If an unsupported dataloader_type is provided.