nemo_automodel.components.datasets.reservoir_sampler
nemo_automodel.components.datasets.reservoir_sampler
Module Contents
Classes
API
Streaming shuffle with a fixed-size buffer.
This is a bounded-memory shuffling wrapper for streaming datasets/iterables.
It maintains a buffer of buffer_size items. Once the buffer is filled,
it repeatedly:
- samples a random buffer slot
- yields the evicted item
- replaces it with the next item from the underlying iterator
When the underlying iterator is exhausted, the remaining buffer items are yielded.
_buffer_size
No getitem method is supported with ReservoirSampler.
Iterate over the iterator and sample items from the buffer.
No len methods is supported with ReservoirSampler.