> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.diffusion.sampler

## Module Contents

### Classes

| Name                                                                                                       | Description                    |
| ---------------------------------------------------------------------------------------------------------- | ------------------------------ |
| [`SequentialBucketSampler`](#nemo_automodel-components-datasets-diffusion-sampler-SequentialBucketSampler) | Production-grade Sampler that: |

### Data

[`logger`](#nemo_automodel-components-datasets-diffusion-sampler-logger)

### API

```python
class nemo_automodel.components.datasets.diffusion.sampler.SequentialBucketSampler(
    dataset: nemo_automodel.components.datasets.diffusion.base_dataset.BaseMultiresolutionDataset,
    base_batch_size: int = 32,
    base_resolution: typing.Tuple[int, int] = (512, 512),
    drop_last: bool = True,
    shuffle_buckets: bool = True,
    shuffle_within_bucket: bool = True,
    dynamic_batch_size: bool = False,
    seed: int = 42,
    num_replicas: typing.Optional[int] = None,
    rank: typing.Optional[int] = None
)
```

**Bases:** `Sampler[List[int]]`

Production-grade Sampler that:

1. Supports Distributed Data Parallel (DDP) - splits data across GPUs
2. Deterministic shuffling via torch.Generator (resumable training)
3. Lazy batch generation (saves RAM compared to pre-computing all batches)
4. Guarantees equal batch counts across all ranks (prevents DDP deadlocks)

* Processes all images in bucket A before moving to bucket B
* Shuffles samples within each bucket (deterministically)
* Drops incomplete batches at end of each bucket
* Uses dynamic batch sizes based on resolution

```python
nemo_automodel.components.datasets.diffusion.sampler.SequentialBucketSampler.__iter__() -> typing.Iterator[typing.List[int]]
```

```python
nemo_automodel.components.datasets.diffusion.sampler.SequentialBucketSampler.__len__() -> int
```

```python
nemo_automodel.components.datasets.diffusion.sampler.SequentialBucketSampler._calculate_total_batches() -> int
```

Calculate total batches ensuring ALL ranks get the same count.
We pad each bucket to be divisible by (num\_replicas \* batch\_size).

```python
nemo_automodel.components.datasets.diffusion.sampler.SequentialBucketSampler._get_batch_size(
    resolution: typing.Tuple[int, int]
) -> int
```

Get batch size for resolution (dynamic or fixed based on setting).

```python
nemo_automodel.components.datasets.diffusion.sampler.SequentialBucketSampler.get_batch_info(
    batch_idx: int
) -> typing.Dict
```

Get information about a specific batch.

Note: With lazy evaluation, we don't pre-compute batches,
so this returns bucket-level info for the estimated batch.

```python
nemo_automodel.components.datasets.diffusion.sampler.SequentialBucketSampler.load_state_dict(
    state_dict: typing.Dict
) -> None
```

Restore sampler state; the next **iter** will skip already-yielded batches.

```python
nemo_automodel.components.datasets.diffusion.sampler.SequentialBucketSampler.set_epoch(
    epoch: int
)
```

Crucial for reproducibility and different shuffles per epoch.

```python
nemo_automodel.components.datasets.diffusion.sampler.SequentialBucketSampler.state_dict() -> typing.Dict
```

Return sampler state for mid-epoch checkpointing.

```python
nemo_automodel.components.datasets.diffusion.sampler.logger = logging.getLogger(__name__)
```