core.transformer.moe.upcycling_utils#
Helpers for converting a dense model to a MoE model in runtime
Module Contents#
Functions#
Retrieve keys from the model that end with a specified suffix. |
|
Find sub-module in model |
|
Get various params from dense state dict and moe model. |
|
Convert a dense model’s state_dict to a MoE model’s state_dict. |
|
Convert a dense model’s state_dict to a MoE model’s state_dict. |
|
Load a dense model checkpoint and convert it to a MoE model. |
Data#
API#
- core.transformer.moe.upcycling_utils.ExpertsType#
‘Enum(…)’
- core.transformer.moe.upcycling_utils.ActivationFuncName#
‘Enum(…)’
- core.transformer.moe.upcycling_utils._get_keys_endswith(model, suffix)#
Retrieve keys from the model that end with a specified suffix.
- core.transformer.moe.upcycling_utils._find_submodule(model, submodule_name)#
Find sub-module in model
- core.transformer.moe.upcycling_utils._get_config(moe_model, dense_model)#
Get various params from dense state dict and moe model.
- core.transformer.moe.upcycling_utils._convert_to_moe_state_dict(moe_model, dense_model)#
Convert a dense model’s state_dict to a MoE model’s state_dict.
This function takes the state dictionary of a dense model and modifies it to fit the structure required by a Mixture of Experts model. It handles the necessary transformations for weights and biases specific to the MoE architecture.
- Parameters:
state_dict (dict) – The dense model’s state_dict.
moe_model (nn.Module) – The MoE model instance from which to get the submodule and state_dict, must be a model without FP16 and/or DDP wrapper.
- Returns:
The converted MoE model state_dict, ready for use in the MoE architecture.
- Return type:
dict
- core.transformer.moe.upcycling_utils.upcycle_state_dict(moe_model, dense_model)#
Convert a dense model’s state_dict to a MoE model’s state_dict.
This function facilitates the conversion of the state_dict from a dense model to a MoE model, ensuring that the parameters are correctly mapped for each model.
- Parameters:
moe_model (nn.Module) – The MoE model, must be a model without FP16 and/or DDP wrapper.
dense_model (nn.Module) – The dense model instance.
- Returns:
A dictionary containing the converted state_dict for the MoE model.
- Return type:
dict
- core.transformer.moe.upcycling_utils.load_and_upcycle_model(
- load_dense_ckpt_func,
- moe_model,
- dense_model,
- strict=True,
- load_args=(),
- load_kwargs={},
Load a dense model checkpoint and convert it to a MoE model.
This function loads a checkpoint for a dense model and converts it to the MoE model format, allowing for the integration of the dense model’s parameters into the MoE architecture. For more detail please refer to https://arxiv.org/abs/2410.07524.
- Parameters:
load_dense_ckpt_func (callable) – The function to load the dense model checkpoint.
moe_model (nn.Module) – The MoE model instance.
dense_model (nn.Module) – The dense model instance.
strict (bool) – Whether to strictly load the state dictionary (default is True).
load_args (tuple) – Positional arguments to pass to the loading function.
load_kwargs (dict) – Keyword arguments to pass to the loading function.