`nemo_automodel.components.models.glm4_moe.state_dict_adapter`#

Module Contents#

Classes#

Glm4MoeStateDictAdapter

Converts between HF GLM4-MoE checkpoints and our grouped-experts native format.

Data#

logger

API#

nemo_automodel.components.models.glm4_moe.state_dict_adapter.logger#: ‘getLogger(…)’

class nemo_automodel.components.models.glm4_moe.state_dict_adapter.Glm4MoeStateDictAdapter( config: Any, moe_config: nemo_automodel.components.moe.config.MoEConfig, backend: nemo_automodel.components.models.common.BackendConfig, dtype: torch.dtype = torch.float32, )#

Bases: nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin, nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

Converts between HF GLM4-MoE checkpoints and our grouped-experts native format.

GLM4-MoE HF experts use keys: model.layers.{L}.mlp.experts.{E}.gate_proj.weight model.layers.{L}.mlp.experts.{E}.up_proj.weight model.layers.{L}.mlp.experts.{E}.down_proj.weight model.layers.{L}.mlp.shared_experts.gate_proj.weight model.layers.{L}.mlp.shared_experts.up_proj.weight model.layers.{L}.mlp.shared_experts.down_proj.weight

Our native format groups them into: model.layers.{L}.mlp.experts.gate_and_up_projs # [n_experts, dim, 2*moe_inter_dim] model.layers.{L}.mlp.experts.down_projs # [n_experts, moe_inter_dim, dim] model.layers.{L}.mlp.shared_expert.gate_proj.weight model.layers.{L}.mlp.shared_expert.up_proj.weight model.layers.{L}.mlp.shared_expert.down_proj.weight

Initialization

to_hf(

state_dict: dict[str, Any],

exclude_key_regex: Optional[str] = None,

quantization: bool = False,

**kwargs,

) → dict[str, Any]#

from_hf(

hf_state_dict: dict[str, Any],

device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,

**kwargs,

) → dict[str, Any]#

convert_single_tensor_to_hf(

fqn: str,

tensor: Any,

**kwargs,

) → list[tuple[str, Any]]#

Convert a single tensor from native format to HuggingFace format.

Parameters:

fqn – Fully qualified name of the tensor in native format
tensor – The tensor to convert
**kwargs – Additional arguments for conversion

Returns:

List of (fqn, tensor) tuples in HuggingFace format

nemo_automodel.components.models.glm4_moe.state_dict_adapter#

Module Contents#

Classes#

Data#

API#

`nemo_automodel.components.models.glm4_moe.state_dict_adapter`#