nemo_automodel.components.models.qwen3_moe.state_dict_adapter
nemo_automodel.components.models.qwen3_moe.state_dict_adapter
Module Contents
Classes
Data
API
Bases: MoESplitExpertsStateDictMixin, StateDictAdapter
Converts between HF Qwen3-MoE checkpoints and our grouped-experts native format.
Convert a single grouped MoE LoRA tensor to PEFT ParamWrapper format.
ParamWrapper format stores fused 3-D expert LoRA parameters as 2-D tensors with the expert dimension folded into the rank dimension.
Shape mapping (automodel native -> ParamWrapper):
down_proj (outer wrapper, NO base_layer prefix — processed first alphabetically):
lora_down_B(E, r, H) ->lora_A.weight(r*E, H) reshapelora_down_A(E, I, r) ->lora_B.weight(I, r*E) permute+reshape
gate_up_proj (inner wrapper, HAS base_layer. prefix):
lora_gate_and_up_B(E, r, 2I) ->base_layer.lora_A.weight(rE, 2*I) reshapelora_gate_and_up_A(E, H, r) ->base_layer.lora_B.weight(H, r*E) permute+reshape
Returns: list[tuple[str, torch.Tensor]]
List containing one (fqn, tensor) tuple in ParamWrapper format.
Convert PEFT ParamWrapper LoRA keys to native grouped MoE LoRA format.
This is the reverse of _convert_lora_to_paramwrapper. It detects
ParamWrapper-format keys and converts them back to the 3-D grouped
tensors expected by GroupedExpertsLoRA.
Reverse transforms (down_proj is outer, gate_up_proj is inner):
experts.lora_A.weight(r*E, H) -> (E, r, H) = lora_down_Bexperts.lora_B.weight(I, r*E) -> (E, I, r) = lora_down_Aexperts.base_layer.lora_A.weight(rE, 2I) -> (E, r, 2*I) = lora_gate_and_up_Bexperts.base_layer.lora_B.weight(H, r*E) -> (E, H, r) = lora_gate_and_up_A
Convert a single tensor from native format to HuggingFace format.
When v4_compatible=False (the default), LoRA expert tensors are
emitted in PEFT v0.18+ ParamWrapper format so that
PeftModel.from_pretrained() can load them directly. When
v4_compatible=True, the legacy per-expert split is used instead
(via the parent mixin).
Parameters:
Fully qualified name of the tensor in native format
The tensor to convert
Additional arguments for conversion
Returns: list[tuple[str, Any]]
List of (fqn, tensor) tuples in HuggingFace format
Convert HF checkpoint to native format, handling ParamWrapper LoRA keys.
Before delegating to the parent _from_hf_w_merged_experts (which
handles legacy per-expert LoRA format), this method scans for
ParamWrapper-format LoRA keys and converts them back to the native
grouped format expected by GroupedExpertsLoRA.