bridge.models.glm.glm_moe_mappings#

GLM MoE mapping helpers for fused expert weights in transformers 5.0+.

Module Contents#

Classes#

_LooseGatedMLPMapping

GLMExpertGateUpProjMapping

Mapping for fused expert gate+up projection weights.

GLMExpertDownProjMapping

Mapping for fused expert down projection weights.

Functions#

API#

bridge.models.glm.glm_moe_mappings._select_expert_weight(
hf_weights: torch.Tensor,
expert_idx: int,
) torch.Tensor#
bridge.models.glm.glm_moe_mappings._align_weight_to_shape(
weight: torch.Tensor,
target_shape: torch.Size,
name: str,
) torch.Tensor#
class bridge.models.glm.glm_moe_mappings._LooseGatedMLPMapping#

Bases: megatron.bridge.models.conversion.param_mapping.GatedMLPMapping

_validate_patterns(*args, **kwargs)#
class bridge.models.glm.glm_moe_mappings.GLMExpertGateUpProjMapping(
megatron_param: str,
hf_param: str,
permute_dims=None,
)#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

Mapping for fused expert gate+up projection weights.

Initialization

hf_to_megatron(
hf_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) torch.Tensor#
megatron_to_hf(
megatron_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) Dict[str, torch.Tensor]#
_validate_patterns(*args, **kwargs)#
class bridge.models.glm.glm_moe_mappings.GLMExpertDownProjMapping(
megatron_param: str,
hf_param: str,
permute_dims=None,
)#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

Mapping for fused expert down projection weights.

Initialization

hf_to_megatron(
hf_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) torch.Tensor#
_validate_patterns(*args, **kwargs)#