> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.reservoir_sampler

## Module Contents

### Classes

| Name                                                                                         | Description                                 |
| -------------------------------------------------------------------------------------------- | ------------------------------------------- |
| [`ReservoirSampler`](#nemo_automodel-components-datasets-reservoir_sampler-ReservoirSampler) | Streaming shuffle with a fixed-size buffer. |

### API

```python
class nemo_automodel.components.datasets.reservoir_sampler.ReservoirSampler(
    iterator: typing.Iterable[typing.Dict[str, typing.Any]],
    buffer_size: int,
    seed: typing.Optional[int] = None
)
```

Streaming shuffle with a fixed-size buffer.

This is a bounded-memory shuffling wrapper for streaming datasets/iterables.
It maintains a buffer of `buffer_size` items. Once the buffer is filled,
it repeatedly:

* samples a random buffer slot
* yields the evicted item
* replaces it with the next item from the underlying iterator

When the underlying iterator is exhausted, the remaining buffer items are
yielded.

```python
nemo_automodel.components.datasets.reservoir_sampler.ReservoirSampler.__getitem__(
    idx: int
) -> typing.Dict[str, typing.Any]
```

No getitem method is supported with ReservoirSampler.

```python
nemo_automodel.components.datasets.reservoir_sampler.ReservoirSampler.__iter__() -> typing.Iterator[typing.Dict[str, typing.Any]]
```

Iterate over the iterator and sample items from the buffer.

```python
nemo_automodel.components.datasets.reservoir_sampler.ReservoirSampler.__len__() -> int
```

No len methods is supported with ReservoirSampler.