> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.hy_v3.state_dict_adapter

State dict conversion between the on-disk tencent/Hy3-preview HF checkpoint
and Automodel's native (grouped-experts) format.

On-disk HF format (what tencent/Hy3-preview safetensors actually contain):
model.layers.\{L}.mlp.expert\_bias                                       # \[n\_experts]
model.layers.\{L}.mlp.router.gate.weight                                # \[n\_experts, hidden]
model.layers.\{L}.mlp.experts.\{E}.gate\_proj.weight                      # \[moe\_inter, hidden]
model.layers.\{L}.mlp.experts.\{E}.up\_proj.weight                        # \[moe\_inter, hidden]
model.layers.\{L}.mlp.experts.\{E}.down\_proj.weight                      # \[hidden, moe\_inter]
model.layers.\{L}.mlp.shared\_mlp.\{gate,up,down}\_proj.weight             # \[moe\_inter, hidden] / \[hidden, moe\_inter]

Automodel native format (matches the rest of the MoE stack):
model.layers.\{L}.mlp.gate.e\_score\_correction\_bias                      # \[n\_local]  (on Gate, not MoE)
model.layers.\{L}.mlp.gate.weight                                       # \[n\_experts, hidden]
model.layers.\{L}.mlp.experts.gate\_and\_up\_projs                         # \[n\_local, hidden, 2\*moe\_inter]
model.layers.\{L}.mlp.experts.down\_projs                                # \[n\_local, moe\_inter, hidden]
model.layers.\{L}.mlp.shared\_experts.\{gate,up,down}\_proj.weight         # unchanged shapes

Differences (vs. every other Automodel MoE adapter):

1. Per-expert split tensors -> grouped (handled by MoESplitExpertsStateDictMixin).
2. Three HYV3-specific name renames: expert\_bias \<-> gate.e\_score\_correction\_bias,
   router.gate.weight \<-> gate.weight, shared\_mlp.\* \<-> shared\_experts.\*.
3. MTP layers (indices >= num\_hidden\_layers) on disk must be filtered out on load.

Why the renames live in the adapter rather than in the storage reader's key\_mapping:
nemo\_automodel/components/checkpoint/checkpointing.py:507 deliberately passes
`reader_key_mapping=None` when a model has a state\_dict\_adapter (to avoid
double-translation). So the adapter's `to_hf` / `from_hf` must produce keys
that match the actual on-disk strings.

## Module Contents

### Classes

| Name                                                                                                      | Description                                                                    |
| --------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
| [`HYV3StateDictAdapter`](#nemo_automodel-components-models-hy_v3-state_dict_adapter-HYV3StateDictAdapter) | Bridges Automodel native (grouped experts) and tencent/Hy3-preview on-disk HF. |

### Data

[`_HF_TO_NATIVE_RENAMES`](#nemo_automodel-components-models-hy_v3-state_dict_adapter-_HF_TO_NATIVE_RENAMES)

[`_NATIVE_TO_HF_RENAMES`](#nemo_automodel-components-models-hy_v3-state_dict_adapter-_NATIVE_TO_HF_RENAMES)

[`logger`](#nemo_automodel-components-models-hy_v3-state_dict_adapter-logger)

### API

```python
class nemo_automodel.components.models.hy_v3.state_dict_adapter.HYV3StateDictAdapter(
    config: typing.Any,
    moe_config: nemo_automodel.components.moe.config.MoEConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    dtype: torch.dtype = torch.bfloat16
)
```

**Bases:** [MoESplitExpertsStateDictMixin](/nemo-automodel/nemo_automodel/components/moe/state_dict_mixin#nemo_automodel-components-moe-state_dict_mixin-MoESplitExpertsStateDictMixin), [StateDictAdapter](/nemo-automodel/nemo_automodel/components/checkpoint/state_dict_adapter#nemo_automodel-components-checkpoint-state_dict_adapter-StateDictAdapter)

Bridges Automodel native (grouped experts) and tencent/Hy3-preview on-disk HF.

Inherits the per-expert split/merge logic from `MoESplitExpertsStateDictMixin`;
only the three HYV3-specific name renames + MTP-layer filtering live here.

```python
nemo_automodel.components.models.hy_v3.state_dict_adapter.HYV3StateDictAdapter._is_mtp_key(
    key: str
) -> bool
```

Return True if *key* belongs to an MTP layer (index >= num\_hidden\_layers).

```python
nemo_automodel.components.models.hy_v3.state_dict_adapter.HYV3StateDictAdapter.convert_single_tensor_to_hf(
    fqn: str,
    tensor: typing.Any,
    kwargs = {}
) -> list[tuple[str, typing.Any]]
```

Per-tensor variant of `to_hf` (used by save paths that stream tensors).

Mirrors `to_hf` but operating on one (fqn, tensor) at a time:

1. Try the mixin's per-expert split. Returns multiple (key, tensor) pairs
   when *fqn* names a grouped expert tensor; otherwise returns `None`.
2. Apply HYV3 name renames to whichever key set we end up with.

```python
nemo_automodel.components.models.hy_v3.state_dict_adapter.HYV3StateDictAdapter.from_hf(
    hf_state_dict: dict[str, typing.Any],
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

Convert the on-disk Tencent state dict to native format.

```python
nemo_automodel.components.models.hy_v3.state_dict_adapter.HYV3StateDictAdapter.to_hf(
    state_dict: dict[str, typing.Any],
    exclude_key_regex: typing.Optional[str] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

Convert native state dict back to the on-disk Tencent format.

```python
nemo_automodel.components.models.hy_v3.state_dict_adapter._HF_TO_NATIVE_RENAMES: tuple[tuple[Pattern[str], str], ...] = ((re.compile('\\.mlp\\.expert_bias$'), '.mlp.gate.e_score_correction_bias'), (re...
```

```python
nemo_automodel.components.models.hy_v3.state_dict_adapter._NATIVE_TO_HF_RENAMES: tuple[tuple[Pattern[str], str], ...] = ((re.compile('\\.mlp\\.gate\\.e_score_correction_bias$'), '.mlp.expert_bias'), (...
```

```python
nemo_automodel.components.models.hy_v3.state_dict_adapter.logger = logging.getLogger(__name__)
```