> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.hy_mt2.state_dict_adapter

State dict conversion between the on-disk tencent/Hy-MT2-30B-A3B HF
checkpoint and Automodel's native (grouped-experts) format.

The on-disk key layout is identical to tencent/Hy3-preview because both
share `model_type: "hy_v3"` and `architectures: ["HYV3ForCausalLM"]`:

model.layers.\{L}.mlp.expert\_bias                                       # \[n\_experts]
model.layers.\{L}.mlp.router.gate.weight                                # \[n\_experts, hidden]
model.layers.\{L}.mlp.experts.\{E}.gate\_proj.weight                      # \[moe\_inter, hidden]
model.layers.\{L}.mlp.experts.\{E}.up\_proj.weight                        # \[moe\_inter, hidden]
model.layers.\{L}.mlp.experts.\{E}.down\_proj.weight                      # \[hidden, moe\_inter]
model.layers.\{L}.mlp.shared\_mlp.\{gate,up,down}\_proj.weight             # shared expert

Automodel native:

model.layers.\{L}.mlp.gate.e\_score\_correction\_bias                      # \[n\_local]
model.layers.\{L}.mlp.gate.weight                                       # \[n\_experts, hidden]
model.layers.\{L}.mlp.experts.gate\_and\_up\_projs                         # grouped
model.layers.\{L}.mlp.experts.down\_projs                                # grouped
model.layers.\{L}.mlp.shared\_experts.\{gate,up,down}\_proj.weight

This adapter handles three on-disk-specific renames plus per-expert
split/merge (via `MoESplitExpertsStateDictMixin`). It is functionally a
clone of `HYV3StateDictAdapter`; kept separate so future Hy-MT2-only
key changes (e.g. an MTP / aux-head extension that Hy-MT2 ships but
Hy3-preview does not) can be added here without affecting Hy3-preview.

## Module Contents

### Classes

| Name                                                                                                         | Description                                                              |
| ------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------ |
| [`HyMT2StateDictAdapter`](#nemo_automodel-components-models-hy_mt2-state_dict_adapter-HyMT2StateDictAdapter) | Bridges Automodel native (grouped experts) and on-disk Hy-MT2 HF format. |

### Data

[`_HF_TO_NATIVE_RENAMES`](#nemo_automodel-components-models-hy_mt2-state_dict_adapter-_HF_TO_NATIVE_RENAMES)

[`_NATIVE_TO_HF_RENAMES`](#nemo_automodel-components-models-hy_mt2-state_dict_adapter-_NATIVE_TO_HF_RENAMES)

[`logger`](#nemo_automodel-components-models-hy_mt2-state_dict_adapter-logger)

### API

```python
class nemo_automodel.components.models.hy_mt2.state_dict_adapter.HyMT2StateDictAdapter(
    config: typing.Any,
    moe_config: nemo_automodel.components.moe.config.MoEConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    dtype: torch.dtype = torch.bfloat16
)
```

**Bases:** [MoESplitExpertsStateDictMixin](/nemo-automodel/nemo_automodel/components/moe/state_dict_mixin#nemo_automodel-components-moe-state_dict_mixin-MoESplitExpertsStateDictMixin), [StateDictAdapter](/nemo-automodel/nemo_automodel/components/checkpoint/state_dict_adapter#nemo_automodel-components-checkpoint-state_dict_adapter-StateDictAdapter)

Bridges Automodel native (grouped experts) and on-disk Hy-MT2 HF format.

```python
nemo_automodel.components.models.hy_mt2.state_dict_adapter.HyMT2StateDictAdapter._is_mtp_key(
    key: str
) -> bool
```

Return True if *key* belongs to an MTP layer (index >= num\_hidden\_layers).

Hy-MT2-30B-A3B does not appear to ship MTP layers in its public
checkpoint, but the filter is kept as a defensive no-op so the
adapter remains symmetric with `HYV3StateDictAdapter`.

```python
nemo_automodel.components.models.hy_mt2.state_dict_adapter.HyMT2StateDictAdapter.convert_single_tensor_to_hf(
    fqn: str,
    tensor: typing.Any,
    kwargs = {}
) -> list[tuple[str, typing.Any]]
```

Per-tensor variant of `to_hf` for streaming-save code paths.

```python
nemo_automodel.components.models.hy_mt2.state_dict_adapter.HyMT2StateDictAdapter.from_hf(
    hf_state_dict: dict[str, typing.Any],
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

On-disk Hy-MT2 HF -> native: filter MTP, rename, then merge experts.

```python
nemo_automodel.components.models.hy_mt2.state_dict_adapter.HyMT2StateDictAdapter.to_hf(
    state_dict: dict[str, typing.Any],
    exclude_key_regex: typing.Optional[str] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

Native -> on-disk Hy-MT2 HF: per-expert split + name renames.

```python
nemo_automodel.components.models.hy_mt2.state_dict_adapter._HF_TO_NATIVE_RENAMES: tuple[tuple[Pattern[str], str], ...] = ((re.compile('\\.mlp\\.expert_bias$'), '.mlp.gate.e_score_correction_bias'), (re...
```

```python
nemo_automodel.components.models.hy_mt2.state_dict_adapter._NATIVE_TO_HF_RENAMES: tuple[tuple[Pattern[str], str], ...] = ((re.compile('\\.mlp\\.gate\\.e_score_correction_bias$'), '.mlp.expert_bias'), (...
```

```python
nemo_automodel.components.models.hy_mt2.state_dict_adapter.logger = logging.getLogger(__name__)
```