core.datasets.utils#
Module Contents#
Classes#
Functions#
Compile C++ helper functions at runtime. Make sure this is invoked on a single process. |
|
Do non-exponentiated normalization |
|
Get the blended_megatron_dataset_config.BlendedMegatronDatasetConfig blend from the blend list |
Data#
API#
- core.datasets.utils.logger#
‘getLogger(…)’
- core.datasets.utils.compile_helpers()#
Compile C++ helper functions at runtime. Make sure this is invoked on a single process.
- core.datasets.utils.normalize(weights: List[float]) List[float]#
Do non-exponentiated normalization
- Parameters:
weights (List[float]) – The weights
- Returns:
The normalized weights
- Return type:
List[float]
- core.datasets.utils.get_blend_from_list(
- blend: Optional[List[str]],
Get the blended_megatron_dataset_config.BlendedMegatronDatasetConfig blend from the blend list
- Parameters:
blend (Optional[List[str]]) – The blend list, which can be either (1) a list of prefixes, e.g. [“path/to/dataset_1_prefix”, “path/to/dataset_2_prefix”], or (2) a flattened, zipped list of weights and prefixes, e.g. [“30”, “path/to/dataset_1_prefix”, “70”, “path/to/dataset_2_prefix”]
- Returns:
The blend, consisting of a list of dataset prefixes and optionally a list of dataset weights, e.g. [[“path/to/dataset_1_prefix”, “path/to/dataset_2_prefix”], [30.0, 70.0]].
- Return type:
Optional[Tuple[List[str], Optional[List[float]]]]