> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.llm.megatron.sampler

## Module Contents

### Classes

| Name                                                                                                                            | Description                                                       |
| ------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- |
| [`BaseMegatronSampler`](#nemo_automodel-components-datasets-llm-megatron-sampler-BaseMegatronSampler)                           | Base class for Megatron batch samplers.                           |
| [`MegatronPretrainingRandomSampler`](#nemo_automodel-components-datasets-llm-megatron-sampler-MegatronPretrainingRandomSampler) | Randomized sampler with per-epoch shuffling and per-rank slicing. |
| [`MegatronPretrainingSampler`](#nemo_automodel-components-datasets-llm-megatron-sampler-MegatronPretrainingSampler)             | Deterministic sequential sampler with per-rank slicing.           |

### Functions

| Name                                                                                                          | Description                    |
| ------------------------------------------------------------------------------------------------------------- | ------------------------------ |
| [`create_megatron_sampler`](#nemo_automodel-components-datasets-llm-megatron-sampler-create_megatron_sampler) | Factory for Megatron samplers. |

### API

```python
class nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler(
    total_samples: int,
    micro_batch_size: int,
    data_parallel_rank: int,
    data_parallel_size: int,
    drop_last: bool = True,
    global_batch_size: typing.Optional[int] = None,
    pad_samples_to_global_batch_size: typing.Optional[bool] = False
)
```

Base class for Megatron batch samplers.

Provides common validation and shared behavior for Megatron samplers.
Implementations must yield lists of dataset indices that correspond to
one micro-batch for a single data-parallel rank.

**Parameters:**

Total available samples in the dataset.

Number of samples per micro-batch on each data-parallel
rank.

Rank id in the data-parallel group that this sampler
will serve.

World size of the data-parallel group.

If True, drop incomplete batches. If False, implementations
may yield a final partial micro-batch (subject to their constraints).

Effective global batch size across all data-parallel
ranks; when provided, length is computed in global-batch units and
converted to micro-batches.

If True and supported by the sampler,
the last incomplete global batch will be padded to `global_batch_size`
when `drop_last` is False.

```python
nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler.__iter__()
```

abstract

```python
nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler.__len__()
```

Return the number of micro-batches this sampler will yield.

If `global_batch_size` is provided, the length is computed in terms of
global batches and converted to micro-batches to align with training
loops that iterate by micro-batch.

```python
class nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingRandomSampler(
    total_samples: int,
    micro_batch_size: int,
    data_parallel_rank: int,
    data_parallel_size: int,
    drop_last: bool = True,
    global_batch_size: typing.Optional[int] = None,
    pad_samples_to_global_batch_size: typing.Optional[bool] = False,
    seed: int = 0
)
```

**Bases:** [BaseMegatronSampler](#nemo_automodel-components-datasets-llm-megatron-sampler-BaseMegatronSampler)

Randomized sampler with per-epoch shuffling and per-rank slicing.

Uses a deterministic seed schedule `seed + epoch` to randomize indices
within each data-parallel shard (bucket). Notably, this sampler:

* Does not support padding the last global batch.
* Requires `drop_last=True` when the product `micro_batch_size *
  data_parallel_size &gt; 1`.

```python
nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingRandomSampler.__iter__()
```

Yield randomized micro-batches for this rank.

Each epoch shuffles indices within the per-rank bucket using
`torch.randperm` seeded by `seed + epoch`. The sampler then emits
contiguous micro-batches of size `micro_batch_size` for this rank.

```python
nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingRandomSampler.__len__()
```

Return the number of micro-batches that will be produced.

Accounts for `drop_last` by excluding a trailing incomplete global batch.
When `global_batch_size` is provided, converts global batches to
micro-batches.

```python
class nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingSampler(
    total_samples: int,
    micro_batch_size: int,
    data_parallel_rank: int,
    data_parallel_size: int,
    drop_last: bool = True,
    global_batch_size: typing.Optional[int] = None,
    pad_samples_to_global_batch_size: typing.Optional[bool] = False
)
```

**Bases:** [BaseMegatronSampler](#nemo_automodel-components-datasets-llm-megatron-sampler-BaseMegatronSampler)

Deterministic sequential sampler with per-rank slicing.

Iterates deterministically over sample indices, splits each global batch
across data-parallel ranks, and yields per-rank micro-batches. When
`drop_last` is False and `pad_samples_to_global_batch_size` is True, the
final global batch is padded to a full size so that all ranks emit complete
micro-batches.

**Raises:**

* `RuntimeError`: If there are no samples left to consume.

```python
nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingSampler.__iter__()
```

Yield lists of indices forming per-rank micro-batches.

Iterates up to `total_samples`. Optionally pads
the last global batch when `drop_last` is False and
`pad_samples_to_global_batch_size` is True.

```python
nemo_automodel.components.datasets.llm.megatron.sampler.MegatronPretrainingSampler.get_start_end_idx()
```

Return slice boundaries for this rank within a global batch.

**Returns:**

Tuple of `(start_idx, end_idx)` used to extract this rank's

```python
nemo_automodel.components.datasets.llm.megatron.sampler.create_megatron_sampler(
    dataset_len: int,
    micro_batch_size: int,
    global_batch_size: int,
    dataloader_type: typing.Literal['single', 'cyclic'] = 'single',
    drop_last: bool = True,
    pad_samples_to_global_batch_size: bool = False,
    rank: int = 0,
    world_size: int = 1
) -> nemo_automodel.components.datasets.llm.megatron.sampler.BaseMegatronSampler
```

Factory for Megatron samplers.

Constructs and returns a Megatron-compatible sampler for a dataset of a
given length and batch configuration. The returned sampler yields lists of
indices per micro-batch for a single data-parallel rank.

**Parameters:**

Number of samples in the underlying dataset.

Number of samples per micro-batch on each
data-parallel rank.

Effective global batch size across all
data-parallel ranks (`micro_batch_size * world_size * grad_accum`).

Sampler type to construct. Supported values:

* "single": Deterministic sequential sampling
  (`MegatronPretrainingSampler`).
* "cyclic": Randomized per-epoch sampling
  (`MegatronPretrainingRandomSampler`).
  The value "batch" is not supported in this implementation.

When True, drop a trailing incomplete batch.

When True and supported by the sampler,
pad the final global batch to `global_batch_size` if `drop_last` is
False.

Data-parallel rank id for this process.

Number of data-parallel ranks.

**Returns:** `BaseMegatronSampler`

Configured sampler instance for the requested type.

**Raises:**

* `Exception`: If an unsupported `dataloader_type` is provided.