nemo_automodel.components.datasets.lazy_mapped_dataset
nemo_automodel.components.datasets.lazy_mapped_dataset
Module Contents
Classes
Data
API
Bases: Dataset
Dataset wrapper that applies a transform function on-the-fly instead of preprocessing the whole dataset upfront with .map(fn).
Parameters:
Any object that supports __len__ and __getitem__
(e.g. a Hugging Face datasets.Dataset).
A callable that accepts a single example and returns the transformed example.
Number of processed items to cache. Defaults to the 10k dataset samples. Set to 0 to disable caching or None to cache all.
Returns:
A map-style dataset that applies map_fn lazily on each item access.
Return LRU cache statistics, or None if caching is disabled.
Returns the transformed item at the given index
Returns pickable state by dropping the unpicklable _get_item function
Returns the number of items in the dataset
returns a string representation of the dataset
Restores state and rebuild _get_item after unpickling
Build the internal item accessor, with or without LRU caching