> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.qwen3_moe.state_dict_adapter

## Module Contents

### Classes

| Name                                                                                                                  | Description                                                                      |
| --------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
| [`Qwen3MoeStateDictAdapter`](#nemo_automodel-components-models-qwen3_moe-state_dict_adapter-Qwen3MoeStateDictAdapter) | Converts between HF Qwen3-MoE checkpoints and our grouped-experts native format. |

### Data

[`_LORA_EXPERT_SUFFIXES`](#nemo_automodel-components-models-qwen3_moe-state_dict_adapter-_LORA_EXPERT_SUFFIXES)

[`logger`](#nemo_automodel-components-models-qwen3_moe-state_dict_adapter-logger)

### API

```python
class nemo_automodel.components.models.qwen3_moe.state_dict_adapter.Qwen3MoeStateDictAdapter(
    config: typing.Any,
    moe_config: nemo_automodel.components.moe.config.MoEConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    dtype: torch.dtype = torch.float32
)
```

**Bases:** [MoESplitExpertsStateDictMixin](/nemo-automodel/nemo_automodel/components/moe/state_dict_mixin#nemo_automodel-components-moe-state_dict_mixin-MoESplitExpertsStateDictMixin), [StateDictAdapter](/nemo-automodel/nemo_automodel/components/checkpoint/state_dict_adapter#nemo_automodel-components-checkpoint-state_dict_adapter-StateDictAdapter)

Converts between HF Qwen3-MoE checkpoints and our grouped-experts native format.

```python
nemo_automodel.components.models.qwen3_moe.state_dict_adapter.Qwen3MoeStateDictAdapter._convert_lora_to_paramwrapper(
    fqn: str,
    tensor: torch.Tensor
) -> list[tuple[str, torch.Tensor]]
```

Convert a single grouped MoE LoRA tensor to PEFT ParamWrapper format.

ParamWrapper format stores fused 3-D expert LoRA parameters as 2-D
tensors with the expert dimension folded into the rank dimension.

Shape mapping (automodel native -> ParamWrapper):

down\_proj (outer wrapper, NO `base_layer` prefix — processed first alphabetically):

* `lora_down_B`  (E, r, H) -> `lora_A.weight`  (r\*E, H)  reshape
* `lora_down_A`  (E, I, r) -> `lora_B.weight`  (I, r\*E)  permute+reshape

gate\_up\_proj (inner wrapper, HAS `base_layer.` prefix):

* `lora_gate_and_up_B`  (E, r, 2*I) -> `base_layer.lora_A.weight`  (r*E, 2\*I)  reshape
* `lora_gate_and_up_A`  (E, H, r)   -> `base_layer.lora_B.weight`  (H, r\*E)    permute+reshape

**Returns:** `list[tuple[str, torch.Tensor]]`

List containing one `(fqn, tensor)` tuple in ParamWrapper format.

```python
nemo_automodel.components.models.qwen3_moe.state_dict_adapter.Qwen3MoeStateDictAdapter._convert_paramwrapper_to_native(
    state_dict: dict[str, typing.Any]
) -> dict[str, typing.Any]
```

Convert PEFT ParamWrapper LoRA keys to native grouped MoE LoRA format.

This is the reverse of `_convert_lora_to_paramwrapper`.  It detects
ParamWrapper-format keys and converts them back to the 3-D grouped
tensors expected by GroupedExpertsLoRA.

Reverse transforms (down\_proj is outer, gate\_up\_proj is inner):

* `experts.lora_A.weight`            (r\*E, H)   -> (E, r, H)    = lora\_down\_B
* `experts.lora_B.weight`            (I, r\*E)   -> (E, I, r)    = lora\_down\_A
* `experts.base_layer.lora_A.weight` (r*E, 2*I) -> (E, r, 2\*I)  = lora\_gate\_and\_up\_B
* `experts.base_layer.lora_B.weight` (H, r\*E)   -> (E, H, r)    = lora\_gate\_and\_up\_A

```python
nemo_automodel.components.models.qwen3_moe.state_dict_adapter.Qwen3MoeStateDictAdapter.convert_single_tensor_to_hf(
    fqn: str,
    tensor: typing.Any,
    kwargs = {}
) -> list[tuple[str, typing.Any]]
```

Convert a single tensor from native format to HuggingFace format.

When `v4_compatible=False` (the default), LoRA expert tensors are
emitted in PEFT v0.18+ ParamWrapper format so that
`PeftModel.from_pretrained()` can load them directly.  When
`v4_compatible=True`, the legacy per-expert split is used instead
(via the parent mixin).

**Parameters:**

Fully qualified name of the tensor in native format

The tensor to convert

Additional arguments for conversion

**Returns:** `list[tuple[str, Any]]`

List of (fqn, tensor) tuples in HuggingFace format

```python
nemo_automodel.components.models.qwen3_moe.state_dict_adapter.Qwen3MoeStateDictAdapter.from_hf(
    hf_state_dict: dict[str, typing.Any],
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

Convert HF checkpoint to native format, handling ParamWrapper LoRA keys.

Before delegating to the parent `_from_hf_w_merged_experts` (which
handles legacy per-expert LoRA format), this method scans for
ParamWrapper-format LoRA keys and converts them back to the native
grouped format expected by `GroupedExpertsLoRA`.

```python
nemo_automodel.components.models.qwen3_moe.state_dict_adapter.Qwen3MoeStateDictAdapter.to_hf(
    state_dict: dict[str, typing.Any],
    exclude_key_regex: typing.Optional[str] = None,
    quantization: bool = False,
    kwargs = {}
) -> dict[str, typing.Any]
```

```python
nemo_automodel.components.models.qwen3_moe.state_dict_adapter._LORA_EXPERT_SUFFIXES = ('lora_gate_and_up_A', 'lora_gate_and_up_B', 'lora_down_A', 'lora_down_B')
```

```python
nemo_automodel.components.models.qwen3_moe.state_dict_adapter.logger = logging.getLogger(__name__)
```