nemo_automodel.components.models.glm4_moe.state_dict_adapter#
Module Contents#
Classes#
Converts between HF GLM4-MoE checkpoints and our grouped-experts native format. |
Data#
API#
- nemo_automodel.components.models.glm4_moe.state_dict_adapter.logger#
‘getLogger(…)’
- class nemo_automodel.components.models.glm4_moe.state_dict_adapter.Glm4MoeStateDictAdapter(
- config: Any,
- moe_config: nemo_automodel.components.moe.layers.MoEConfig,
- backend: nemo_automodel.components.moe.utils.BackendConfig,
- dtype: torch.dtype = torch.float32,
Bases:
nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin,nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapterConverts between HF GLM4-MoE checkpoints and our grouped-experts native format.
GLM4-MoE HF experts use keys: model.layers.{L}.mlp.experts.{E}.gate_proj.weight model.layers.{L}.mlp.experts.{E}.up_proj.weight model.layers.{L}.mlp.experts.{E}.down_proj.weight model.layers.{L}.mlp.shared_experts.gate_proj.weight model.layers.{L}.mlp.shared_experts.up_proj.weight model.layers.{L}.mlp.shared_experts.down_proj.weight
Our native format groups them into: model.layers.{L}.mlp.experts.gate_and_up_projs # [n_experts, dim, 2*moe_inter_dim] model.layers.{L}.mlp.experts.down_projs # [n_experts, moe_inter_dim, dim] model.layers.{L}.mlp.shared_expert.gate_proj.weight model.layers.{L}.mlp.shared_expert.up_proj.weight model.layers.{L}.mlp.shared_expert.down_proj.weight
Initialization
- to_hf(
- state_dict: dict[str, Any],
- exclude_key_regex: Optional[str] = None,
- quantization: bool = False,
- **kwargs,
- from_hf(
- hf_state_dict: dict[str, Any],
- device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
- **kwargs,
- convert_single_tensor_to_hf(
- fqn: str,
- tensor: Any,
- **kwargs,
Convert a single tensor from native format to HuggingFace format.
- Parameters:
fqn – Fully qualified name of the tensor in native format
tensor – The tensor to convert
**kwargs – Additional arguments for conversion
- Returns:
List of (fqn, tensor) tuples in HuggingFace format