nemo_automodel.components.datasets.llm.megatron.megatron_utils

View as Markdown

Module Contents

Functions

NameDescription
compile_helperCompile helper function ar runtime. Make sure this
get_blend_from_listGet the megatron.core.datasets.blended_megatron_dataset_config.BlendedMegatronDatasetConfig blend from the blend list

Data

logger

API

nemo_automodel.components.datasets.llm.megatron.megatron_utils.compile_helper()

Compile helper function ar runtime. Make sure this is invoked on a single process.

nemo_automodel.components.datasets.llm.megatron.megatron_utils.get_blend_from_list(
blend: typing.Optional[typing.List[str]]
) -> typing.Optional[typing.Tuple[typing.List[str], typing.Optional[typing.List[float]]]]

Get the megatron.core.datasets.blended_megatron_dataset_config.BlendedMegatronDatasetConfig blend from the blend list

Parameters:

blend
Optional[List[str]]

The blend list, which can be either (1) a list of prefixes, e.g. [“path/to/dataset_1_prefix”, “path/to/dataset_2_prefix”], or (2) a flattened, zipped list of weights and prefixes, e.g. [“30”, “path/to/dataset_1_prefix”, “70”, “path/to/dataset_2_prefix”]

Returns: Optional[Tuple[List[str], Optional[List[float]]]]

Optional[Tuple[List[str], Optional[List[float]]]]: The blend, consisting of a list of dataset prefixes and optionally a list of dataset weights, e.g. [[“path/to/dataset_1_prefix”, “path/to/dataset_2_prefix”], [30.0, 70.0]].

nemo_automodel.components.datasets.llm.megatron.megatron_utils.logger = logging.getLogger(__name__)