nemo_automodel.components.datasets.lazy_mapped_dataset#
Module Contents#
Classes#
Dataset wrapper that applies a transform function on-the-fly instead of preprocessing the whole dataset upfront with .map(fn). |
Data#
API#
- nemo_automodel.components.datasets.lazy_mapped_dataset.logger#
‘getLogger(…)’
- class nemo_automodel.components.datasets.lazy_mapped_dataset.LazyMappedDataset(dataset, map_fn, cache_size=10000)#
Bases:
torch.utils.data.DatasetDataset wrapper that applies a transform function on-the-fly instead of preprocessing the whole dataset upfront with .map(fn).
- Parameters:
dataset – Any object that supports
__len__and__getitem__(e.g. a Hugging Facedatasets.Dataset).map_fn – A callable that accepts a single example and returns the transformed example.
cache_size – Number of processed items to cache. Defaults to the 10k dataset samples. Set to 0 to disable caching or None to cache all.
- Returns:
A map-style dataset that applies map_fn lazily on each item access.
Initialization
- _build_get_item() None#
Build the internal item accessor, with or without LRU caching
- __getstate__() dict#
Returns pickable state by dropping the unpicklable _get_item function
- __setstate__(state: dict) None#
Restores state and rebuild _get_item after unpickling
- __len__() int#
Returns the number of items in the dataset
- __getitem__(idx: int) Any#
Returns the transformed item at the given index
- property cache_info: Any | None#
Return LRU cache statistics, or
Noneif caching is disabled.
- __repr__() str#
returns a string representation of the dataset