> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.nemotron_v3.state_dict_adapter

## Module Contents

### Classes

| Name                                                                                                                        | Description                               |
| --------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------- |
| [`NemotronV3StateDictAdapter`](#nemo_automodel-components-models-nemotron_v3-state_dict_adapter-NemotronV3StateDictAdapter) | State dict adapter for NemotronV3 models. |

### Data

[`logger`](#nemo_automodel-components-models-nemotron_v3-state_dict_adapter-logger)

### API

```python
class nemo_automodel.components.models.nemotron_v3.state_dict_adapter.NemotronV3StateDictAdapter(
    config,
    moe_config: nemo_automodel.components.moe.config.MoEConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    dtype: torch.dtype = torch.bfloat16
)
```

**Bases:** [MoESplitExpertsStateDictMixin](/nemo-automodel/nemo_automodel/components/moe/state_dict_mixin#nemo_automodel-components-moe-state_dict_mixin-MoESplitExpertsStateDictMixin), [StateDictAdapter](/nemo-automodel/nemo_automodel/components/checkpoint/state_dict_adapter#nemo_automodel-components-checkpoint-state_dict_adapter-StateDictAdapter)

State dict adapter for NemotronV3 models.

Converts between HuggingFace checkpoint format and internal NeMo format.

HF format uses 'backbone' prefix:

* backbone.embed\_tokens.weight
* backbone.layers.\{}.norm.weight
* backbone.layers.\{}.mixer.\* (mamba/attention/moe components)
* backbone.norm\_f.weight
* lm\_head.weight

Internal format uses 'model' prefix:

* model.embed\_tokens.weight
* model.layers.\{}.norm.weight
* model.layers.\{}.mixer.\* (mamba/attention/moe components)
* model.norm.weight
* lm\_head.weight

NemotronV3 uses ReLU² activation (non-gated), so gate\_and\_up\_projs has
shape \[n\_experts, dim, inter\_dim] instead of \[n\_experts, dim, 2\*inter\_dim].

Note: NemotronV3 uses 'mixer' instead of 'mlp' in layer paths.

NemotronV3 uses 'mixer.experts' instead of 'mlp.experts'.

NemotronV3 HF format uses 'backbone.' prefix.

```python
nemo_automodel.components.models.nemotron_v3.state_dict_adapter.NemotronV3StateDictAdapter.convert_single_tensor_to_hf(
    fqn: str,
    tensor: typing.Any,
    kwargs = {}
) -> list[tuple[str, typing.Any]]
```

Convert a single tensor from internal format to HuggingFace format.

**Parameters:**

Fully qualified name of the tensor in internal format

The tensor to convert

Additional arguments for conversion

**Returns:** `list[tuple[str, Any]]`

List of (fqn, tensor) tuples in HuggingFace format

```python
nemo_automodel.components.models.nemotron_v3.state_dict_adapter.NemotronV3StateDictAdapter.from_hf(
    hf_state_dict: dict[str, typing.Any],
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

Convert HF checkpoint to internal format.

* Rename backbone → model
* Rename norm\_f → norm
* Aggregate per-expert weights into grouped tensors
* If device\_mesh is provided, only load experts needed for the current rank
* Process MTP keys (`mtp.layers.&#123;i&#125;.*`) separately, reusing the
  same MoE expert-merge logic for the MoE sublayer of each MTP depth.

**Parameters:**

HuggingFace format state dict

Optional device mesh for distributed expert loading

Additional arguments

**Returns:** `dict[str, Any]`

Internal format state dict

```python
nemo_automodel.components.models.nemotron_v3.state_dict_adapter.NemotronV3StateDictAdapter.to_hf(
    state_dict: dict[str, typing.Any],
    exclude_key_regex: typing.Optional[str] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

Convert from internal model state dict to HuggingFace format.

**Parameters:**

Internal format state dict

Optional regex pattern to exclude keys

Additional arguments

**Returns:** `dict[str, Any]`

HuggingFace format state dict

```python
nemo_automodel.components.models.nemotron_v3.state_dict_adapter.logger = logging.getLogger(__name__)
```