Data#
- class nemo.collections.common.data.dataset.ConcatDataset(*args: Any, **kwargs: Any)[source]#
Bases:
IterableDataset
A dataset that accepts as argument multiple datasets and then samples from them based on the specified sampling technique. :param datasets: A list of datasets to sample from. :type datasets: list :param shuffle: Whether to shuffle individual datasets. Only works with non-iterable datasets.
Defaults to True.
- Parameters
sampling_technique (str) – Sampling technique to choose which dataset to draw a sample from. Defaults to ‘temperature’. Currently supports ‘temperature’, ‘random’ and ‘round-robin’.
sampling_temperature (int) – Temperature value for sampling. Only used when sampling_technique = ‘temperature’. Defaults to 5.
sampling_scale – Gives you the ability to upsample / downsample the dataset. Defaults to 1.
sampling_probabilities (list) – Probability values for sampling. Only used when sampling_technique = ‘random’.
seed – Optional value to seed the numpy RNG.
global_rank (int) – Worker rank, used for partitioning map style datasets. Defaults to 0.
world_size (int) – Total number of processes, used for partitioning map style datasets. Defaults to 1.
- class nemo.collections.common.data.dataset.ConcatMapDataset(*args: Any, **kwargs: Any)[source]#
Bases:
Dataset
A dataset that accepts as argument multiple datasets and then samples from them based on the specified sampling technique. :param datasets: A list of datasets to sample from. :type datasets: list :param sampling_technique: Sampling technique to choose which dataset to draw a sample from.
Defaults to ‘temperature’. Currently supports ‘temperature’, ‘random’ and ‘round-robin’.
- Parameters
sampling_temperature (int) – Temperature value for sampling. Only used when sampling_technique = ‘temperature’. Defaults to 5.
sampling_probabilities (list) – Probability values for sampling. Only used when sampling_technique = ‘random’.
seed – Optional value to seed the numpy RNG.