`core.transformer.moe.upcycling_utils`#

Helpers for converting a dense model to a MoE model in runtime

Module Contents#

Functions#

`_get_keys_endswith`	Retrieve keys from the model that end with a specified suffix.
`_find_submodule`	Find sub-module in model
`_get_config`	Get various params from dense state dict and moe model.
`_convert_to_moe_state_dict`	Convert a dense model’s state_dict to a MoE model’s state_dict.
`upcycle_state_dict`	Convert a dense model’s state_dict to a MoE model’s state_dict.
`load_and_upcycle_model`	Load a dense model checkpoint and convert it to a MoE model.

Data#

`ExpertsType`
`ActivationFuncName`

API#

core.transformer.moe.upcycling_utils.ExpertsType#: ‘Enum(…)’

core.transformer.moe.upcycling_utils.ActivationFuncName#: ‘Enum(…)’

core.transformer.moe.upcycling_utils._get_keys_endswith(model, suffix)#: Retrieve keys from the model that end with a specified suffix.

core.transformer.moe.upcycling_utils._find_submodule(model, submodule_name)#: Find sub-module in model

core.transformer.moe.upcycling_utils._get_config(moe_model, dense_model)#: Get various params from dense state dict and moe model.

core.transformer.moe.upcycling_utils._convert_to_moe_state_dict(moe_model, dense_model)#

Convert a dense model’s state_dict to a MoE model’s state_dict.

This function takes the state dictionary of a dense model and modifies it to fit the structure required by a Mixture of Experts model. It handles the necessary transformations for weights and biases specific to the MoE architecture.

Parameters:

state_dict (dict) – The dense model’s state_dict.
moe_model (nn.Module) – The MoE model instance from which to get the submodule and state_dict, must be a model without FP16 and/or DDP wrapper.

Returns:

The converted MoE model state_dict, ready for use in the MoE architecture.

Return type:

dict

core.transformer.moe.upcycling_utils.upcycle_state_dict(moe_model, dense_model)#

Convert a dense model’s state_dict to a MoE model’s state_dict.

This function facilitates the conversion of the state_dict from a dense model to a MoE model, ensuring that the parameters are correctly mapped for each model.

Parameters:

moe_model (nn.Module) – The MoE model, must be a model without FP16 and/or DDP wrapper.
dense_model (nn.Module) – The dense model instance.

Returns:

A dictionary containing the converted state_dict for the MoE model.

Return type:

dict

core.transformer.moe.upcycling_utils.load_and_upcycle_model( load_dense_ckpt_func, moe_model, dense_model, strict=True, load_args=(), load_kwargs={}, )#

Load a dense model checkpoint and convert it to a MoE model.

This function loads a checkpoint for a dense model and converts it to the MoE model format, allowing for the integration of the dense model’s parameters into the MoE architecture. For more detail please refer to https://arxiv.org/abs/2410.07524.

Parameters:

load_dense_ckpt_func (callable) – The function to load the dense model checkpoint.
moe_model (nn.Module) – The MoE model instance.
dense_model (nn.Module) – The dense model instance.
strict (bool) – Whether to strictly load the state dictionary (default is True).
load_args (tuple) – Positional arguments to pass to the loading function.
load_kwargs (dict) – Keyword arguments to pass to the loading function.

core.transformer.moe.upcycling_utils#

Module Contents#

Functions#

Data#

API#

`core.transformer.moe.upcycling_utils`#