> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.nemotron_omni.state_dict_adapter

State dict adapter for NemotronOmni (NemotronH\_Nano\_Omni\_Reasoning\_V3) models.

Converts between HuggingFace checkpoint format and the custom Automodel format.

HF checkpoint key structure (from model.safetensors.index.json):

# Vision encoder (RADIO) -- loaded as-is into self.vision\_model

vision\_model.radio\_model.model.blocks.\{N}.\{...}
vision\_model.radio\_model.input\_conditioner.norm\_mean
vision\_model.radio\_model.input\_conditioner.norm\_std
vision\_model.radio\_model.model.patch\_generator.\{...}

# Vision projector -- loaded into self.vision\_projector

HF:     mlp1.0.weight  (RMSNorm)
Custom: vision\_projector.norm.weight
HF:     mlp1.1.weight  (Linear1)
Custom: vision\_projector.linear1.weight
HF:     mlp1.3.weight  (Linear2)
Custom: vision\_projector.linear2.weight

# Sound encoder (Parakeet) -- loaded into self.sound\_encoder

HF:     sound\_encoder.encoder.\{...}
Custom: sound\_encoder.\{...}

# Sound projector -- loaded into self.sound\_projection

HF:     sound\_projection.norm.weight
Custom: sound\_projection.norm.weight
HF:     sound\_projection.linear1.weight
Custom: sound\_projection.linear1.weight
HF:     sound\_projection.linear2.weight
Custom: sound\_projection.linear2.weight

# LLM (NemotronH) -- uses nemotron\_v3 state\_dict\_adapter internally

HF:     language\_model.backbone.embeddings.weight
Custom: language\_model.model.embed\_tokens.weight
HF:     language\_model.backbone.layers.\{N}.\{...}
Custom: language\_model.model.layers.\{N}.\{...}
HF:     language\_model.backbone.norm\_f.weight
Custom: language\_model.model.norm.weight
HF:     language\_model.lm\_head.weight
Custom: language\_model.lm\_head.weight

For MoE layers in the LLM:
HF:     language\_model.backbone.layers.\{N}.mixer.experts.\{E}.up\_proj.weight   (split per-expert)
HF:     language\_model.backbone.layers.\{N}.mixer.experts.\{E}.down\_proj.weight
Custom: language\_model.model.layers.\{N}.mixer.experts.gate\_and\_up\_projs       (merged)
Custom: language\_model.model.layers.\{N}.mixer.experts.down\_projs

## Module Contents

### Classes

| Name                                                                                                                              | Description                                                                        |
| --------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- |
| [`NemotronOmniStateDictAdapter`](#nemo_automodel-components-models-nemotron_omni-state_dict_adapter-NemotronOmniStateDictAdapter) | State dict adapter for NemotronOmni (NemotronH\_Nano\_Omni\_Reasoning\_V3) models. |

### Data

[`_VISION_PROJ_CUSTOM_TO_HF`](#nemo_automodel-components-models-nemotron_omni-state_dict_adapter-_VISION_PROJ_CUSTOM_TO_HF)

[`_VISION_PROJ_HF_TO_CUSTOM`](#nemo_automodel-components-models-nemotron_omni-state_dict_adapter-_VISION_PROJ_HF_TO_CUSTOM)

[`logger`](#nemo_automodel-components-models-nemotron_omni-state_dict_adapter-logger)

### API

```python
class nemo_automodel.components.models.nemotron_omni.state_dict_adapter.NemotronOmniStateDictAdapter(
    config,
    llm_config,
    moe_config: nemo_automodel.components.moe.config.MoEConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    dtype: torch.dtype = torch.bfloat16
)
```

**Bases:** [StateDictAdapter](/nemo-automodel/nemo_automodel/components/checkpoint/state_dict_adapter#nemo_automodel-components-checkpoint-state_dict_adapter-StateDictAdapter)

State dict adapter for NemotronOmni (NemotronH\_Nano\_Omni\_Reasoning\_V3) models.

Handles conversion between HF checkpoint format and custom Automodel format.

The adapter delegates LLM key conversion to NemotronV3StateDictAdapter
(which handles backbone->model renaming, norm\_f->norm, embeddings->embed\_tokens,
and MoE expert merging) and handles vision/audio components directly.

```python
nemo_automodel.components.models.nemotron_omni.state_dict_adapter.NemotronOmniStateDictAdapter.convert_single_tensor_to_hf(
    fqn: str,
    tensor: typing.Any,
    kwargs = {}
) -> list[tuple[str, typing.Any]]
```

Convert a single tensor from custom format to HF format.

**Parameters:**

Fully qualified name of the tensor

The tensor to convert

Additional arguments

**Returns:** `list[tuple[str, Any]]`

List of (fqn, tensor) tuples in HF format

```python
nemo_automodel.components.models.nemotron_omni.state_dict_adapter.NemotronOmniStateDictAdapter.from_hf(
    hf_state_dict: dict[str, typing.Any],
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

Convert HF checkpoint state dict to custom Automodel format.

Steps:

1. Separate HF state dict into: vision\_model, mlp1, sound\_encoder,
   sound\_projection, language\_model components
2. Convert vision projector keys (mlp1.\* -> vision\_projector.\*)
3. Convert sound encoder keys (sound\_encoder.encoder.\* -> sound\_encoder.\*)
4. Pass language\_model keys through NemotronV3StateDictAdapter
5. Merge everything back

**Parameters:**

HuggingFace format state dict

Optional device mesh for distributed expert loading

Additional arguments

**Returns:** `dict[str, Any]`

Custom format state dict

```python
nemo_automodel.components.models.nemotron_omni.state_dict_adapter.NemotronOmniStateDictAdapter.to_hf(
    state_dict: dict[str, typing.Any],
    exclude_key_regex: typing.Optional[str] = None,
    kwargs = {}
) -> dict[str, typing.Any]
```

Convert custom Automodel state dict to HF format.

Steps:

1. Separate state dict into components
2. Convert vision projector keys back (vision\_projector.\* -> mlp1.\*)
3. Convert sound encoder keys back (sound\_encoder.\* -> sound\_encoder.encoder.\*)
4. Pass LLM keys through NemotronV3StateDictAdapter.to\_hf
5. Merge everything back

**Parameters:**

Custom format state dict

Optional regex pattern to exclude keys

Additional arguments

**Returns:** `dict[str, Any]`

HuggingFace format state dict

```python
nemo_automodel.components.models.nemotron_omni.state_dict_adapter._VISION_PROJ_CUSTOM_TO_HF = {v: k for k, v in (_VISION_PROJ_HF_TO_CUSTOM.items())}
```

```python
nemo_automodel.components.models.nemotron_omni.state_dict_adapter._VISION_PROJ_HF_TO_CUSTOM = {'mlp1.0.weight': 'vision_projector.norm.weight', 'mlp1.1.weight': 'vision_proje...
```

```python
nemo_automodel.components.models.nemotron_omni.state_dict_adapter.logger = logging.getLogger(__name__)
```