Data#
- class nemo.collections.common.data.dataset.ConcatDataset(*args: Any, **kwargs: Any)#
Bases:
IterableDataset
A dataset that accepts as argument multiple datasets and then samples from them based on the specified sampling technique.
- Parameters:
datasets (list) – A list of datasets to sample from.
shuffle (bool) – Whether to shuffle individual datasets. Only works with non-iterable datasets. Defaults to True.
sampling_technique (str) – Sampling technique to choose which dataset to draw a sample from. Defaults to ‘temperature’. Currently supports ‘temperature’, ‘random’ and ‘round-robin’.
sampling_temperature (int) – Temperature value for sampling. Only used when sampling_technique = ‘temperature’. Defaults to 5.
sampling_scale – Gives you the ability to upsample / downsample the dataset. Defaults to 1.
sampling_probabilities (list) – Probability values for sampling. Only used when sampling_technique = ‘random’.
seed – Optional value to seed the numpy RNG.
global_rank (int) – Worker rank, used for partitioning map style datasets. Defaults to 0.
world_size (int) – Total number of processes, used for partitioning map style datasets. Defaults to 1.
- get_iterable(dataset)#
- static random_generator(datasets, **kwargs)#
- static round_robin_generator(datasets, **kwargs)#
- static temperature_generator(datasets, **kwargs)#
- class nemo.collections.common.data.dataset.ConcatMapDataset(*args: Any, **kwargs: Any)#
Bases:
Dataset
A dataset that accepts as argument multiple datasets and then samples from them based on the specified sampling technique.
- Parameters:
datasets (list) – A list of datasets to sample from.
sampling_technique (str) – Sampling technique to choose which dataset to draw a sample from. Defaults to ‘temperature’. Currently supports ‘temperature’, ‘random’ and ‘round-robin’.
sampling_temperature (int) – Temperature value for sampling. Only used when sampling_technique = ‘temperature’. Defaults to 5.
sampling_probabilities (list) – Probability values for sampling. Only used when sampling_technique = ‘random’.
seed – Optional value to seed the numpy RNG.