> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.ling_v2.state_dict_adapter

HF \<-> NeMo state-dict adapter for BailingMoeV2 (Ling 2.0).

Handles the rename map between the HuggingFace checkpoint layout

model.word\_embeddings.weight
model.layers.\{N}.attention.query\_key\_value.weight      # fused \[Q | K | V]
model.layers.\{N}.attention.dense.weight
model.layers.\{N}.attention.query\_layernorm.weight
model.layers.\{N}.attention.key\_layernorm.weight
model.layers.\{N}.mlp.gate.weight
model.layers.\{N}.mlp.gate.expert\_bias
model.layers.\{N}.mlp.experts.\{E}.\{gate\_proj,up\_proj,down\_proj}.weight
model.layers.\{N}.mlp.shared\_experts.\{gate\_proj,up\_proj,down\_proj}.weight

and the native NeMo layout used by this package

model.embed\_tokens.weight
model.layers.\{N}.self\_attn.\{q\_proj,k\_proj,v\_proj,o\_proj}.weight
model.layers.\{N}.self\_attn.\{q\_norm,k\_norm}.weight
model.layers.\{N}.mlp.gate.weight
model.layers.\{N}.mlp.gate.e\_score\_correction\_bias
model.layers.\{N}.mlp.experts.\{gate\_and\_up\_projs,down\_projs}
model.layers.\{N}.mlp.shared\_experts.\{gate\_proj,up\_proj,down\_proj}.weight

The per-expert grouping is delegated to `MoESplitExpertsStateDictMixin`; this
adapter only normalises the surrounding key names and splits the fused QKV.

## Module Contents

### Classes

| Name                                                                                                                        | Description                                                 |
| --------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- |
| [`BailingMoeV2StateDictAdapter`](#nemo_automodel-components-models-ling_v2-state_dict_adapter-BailingMoeV2StateDictAdapter) | State-dict adapter for BailingMoeV2 / Ling 2.0 checkpoints. |

### Functions

| Name                                                                                                        | Description |
| ----------------------------------------------------------------------------------------------------------- | ----------- |
| [`_rename_hf_to_native`](#nemo_automodel-components-models-ling_v2-state_dict_adapter-_rename_hf_to_native) | -           |
| [`_rename_native_to_hf`](#nemo_automodel-components-models-ling_v2-state_dict_adapter-_rename_native_to_hf) | -           |

### Data

[`_LAYER_QKV_RE`](#nemo_automodel-components-models-ling_v2-state_dict_adapter-_LAYER_QKV_RE)

[`_RENAME_PAIRS_HF_TO_NATIVE`](#nemo_automodel-components-models-ling_v2-state_dict_adapter-_RENAME_PAIRS_HF_TO_NATIVE)

### API

```python
class nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter(
    config: nemo_automodel.components.models.ling_v2.config.BailingMoeV2Config,
    moe_config: nemo_automodel.components.moe.config.MoEConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    dtype: torch.dtype = torch.bfloat16
)
```

**Bases:** [MoESplitExpertsStateDictMixin](/nemo-automodel/nemo_automodel/components/moe/state_dict_mixin#nemo_automodel-components-moe-state_dict_mixin-MoESplitExpertsStateDictMixin), [StateDictAdapter](/nemo-automodel/nemo_automodel/components/checkpoint/state_dict_adapter#nemo_automodel-components-checkpoint-state_dict_adapter-StateDictAdapter)

State-dict adapter for BailingMoeV2 / Ling 2.0 checkpoints.

```python
nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter._split_fused_qkv_and_rename(
    hf_state_dict: dict[str, typing.Any]
) -> dict[str, typing.Any]
```

Split each fused `query_key_value` weight into q/k/v and apply renames.

```python
nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter.convert_single_tensor_to_hf(
    fqn: str,
    tensor: typing.Any,
    kwargs = {}
) -> list[tuple[str, typing.Any]]
```

Convert a single native tensor to HuggingFace format.

`q_proj` / `k_proj` / `v_proj` tensors cannot be re-fused without
their two siblings; the caller should batch them through :meth:`to_hf`
instead.  This single-tensor path emits the per-projection HF key (which
is **not** the standard fused name) so that the value is not silently
dropped during DCP save adapters that walk tensors one-by-one.

```python
nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter.from_hf(
    hf_state_dict: dict[str, typing.Any],
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

```python
nemo_automodel.components.models.ling_v2.state_dict_adapter.BailingMoeV2StateDictAdapter.to_hf(
    state_dict: dict[str, typing.Any],
    exclude_key_regex: typing.Optional[str] = None,
    quantization: bool = False,
    kwargs = {}
) -> dict[str, typing.Any]
```

```python
nemo_automodel.components.models.ling_v2.state_dict_adapter._rename_hf_to_native(
    key: str
) -> str
```

```python
nemo_automodel.components.models.ling_v2.state_dict_adapter._rename_native_to_hf(
    key: str
) -> str
```

```python
nemo_automodel.components.models.ling_v2.state_dict_adapter._LAYER_QKV_RE = re.compile('^(?P<prefix>(?:.*\\.)?layers\\.\\d+)\\.attention\\.query_key_value\\...
```

```python
nemo_automodel.components.models.ling_v2.state_dict_adapter._RENAME_PAIRS_HF_TO_NATIVE: tuple[tuple[str, str], ...] = (('model.word_embeddings.', 'model.embed_tokens.'), ('.attention.dense.', '.self...
```