core.transformer.moe.upcycling_utils#

Helpers for converting a dense model to a MoE model in runtime

Module Contents#

Functions#

_get_keys_endswith

Retrieve keys from the model that end with a specified suffix.

_find_submodule

Find sub-module in model

_get_config

Get various params from dense state dict and moe model.

_convert_to_moe_state_dict

Convert a dense model’s state_dict to a MoE model’s state_dict.

upcycle_state_dict

Convert a dense model’s state_dict to a MoE model’s state_dict.

load_and_upcycle_model

Load a dense model checkpoint and convert it to a MoE model.

Data#

API#

core.transformer.moe.upcycling_utils.ExpertsType#

‘Enum(…)’

core.transformer.moe.upcycling_utils.ActivationFuncName#

‘Enum(…)’

core.transformer.moe.upcycling_utils._get_keys_endswith(model, suffix)#

Retrieve keys from the model that end with a specified suffix.

core.transformer.moe.upcycling_utils._find_submodule(model, submodule_name)#

Find sub-module in model

core.transformer.moe.upcycling_utils._get_config(moe_model, dense_model)#

Get various params from dense state dict and moe model.

core.transformer.moe.upcycling_utils._convert_to_moe_state_dict(moe_model, dense_model)#

Convert a dense model’s state_dict to a MoE model’s state_dict.

This function takes the state dictionary of a dense model and modifies it to fit the structure required by a Mixture of Experts model. It handles the necessary transformations for weights and biases specific to the MoE architecture.

Parameters:
  • state_dict (dict) – The dense model’s state_dict.

  • moe_model (nn.Module) – The MoE model instance from which to get the submodule and state_dict, must be a model without FP16 and/or DDP wrapper.

Returns:

The converted MoE model state_dict, ready for use in the MoE architecture.

Return type:

dict

core.transformer.moe.upcycling_utils.upcycle_state_dict(moe_model, dense_model)#

Convert a dense model’s state_dict to a MoE model’s state_dict.

This function facilitates the conversion of the state_dict from a dense model to a MoE model, ensuring that the parameters are correctly mapped for each model.

Parameters:
  • moe_model (nn.Module) – The MoE model, must be a model without FP16 and/or DDP wrapper.

  • dense_model (nn.Module) – The dense model instance.

Returns:

A dictionary containing the converted state_dict for the MoE model.

Return type:

dict

core.transformer.moe.upcycling_utils.load_and_upcycle_model(
load_dense_ckpt_func,
moe_model,
dense_model,
strict=True,
load_args=(),
load_kwargs={},
)#

Load a dense model checkpoint and convert it to a MoE model.

This function loads a checkpoint for a dense model and converts it to the MoE model format, allowing for the integration of the dense model’s parameters into the MoE architecture. For more detail please refer to https://arxiv.org/abs/2410.07524.

Parameters:
  • load_dense_ckpt_func (callable) – The function to load the dense model checkpoint.

  • moe_model (nn.Module) – The MoE model instance.

  • dense_model (nn.Module) – The dense model instance.

  • strict (bool) – Whether to strictly load the state dictionary (default is True).

  • load_args (tuple) – Positional arguments to pass to the loading function.

  • load_kwargs (dict) – Keyword arguments to pass to the loading function.