nemo_automodel.components.models.nemotron_omni.state_dict_adapter#

State dict adapter for NemotronOmni (NemotronH_Nano_Omni_Reasoning_V3) models.

Converts between HuggingFace checkpoint format and the custom Automodel format.

HF checkpoint key structure (from model.safetensors.index.json): # Vision encoder (RADIO) – loaded as-is into self.vision_model vision_model.radio_model.model.blocks.{N}.{…} vision_model.radio_model.input_conditioner.norm_mean vision_model.radio_model.input_conditioner.norm_std vision_model.radio_model.model.patch_generator.{…}

# Vision projector -- loaded into self.vision_projector
HF:     mlp1.0.weight  (RMSNorm)
Custom: vision_projector.norm.weight
HF:     mlp1.1.weight  (Linear1)
Custom: vision_projector.linear1.weight
HF:     mlp1.3.weight  (Linear2)
Custom: vision_projector.linear2.weight

# Sound encoder (Parakeet) -- loaded into self.sound_encoder
HF:     sound_encoder.encoder.{...}
Custom: sound_encoder.{...}

# Sound projector -- loaded into self.sound_projection
HF:     sound_projection.norm.weight
Custom: sound_projection.norm.weight
HF:     sound_projection.linear1.weight
Custom: sound_projection.linear1.weight
HF:     sound_projection.linear2.weight
Custom: sound_projection.linear2.weight

# LLM (NemotronH) -- uses nemotron_v3 state_dict_adapter internally
HF:     language_model.backbone.embeddings.weight
Custom: language_model.model.embed_tokens.weight
HF:     language_model.backbone.layers.{N}.{...}
Custom: language_model.model.layers.{N}.{...}
HF:     language_model.backbone.norm_f.weight
Custom: language_model.model.norm.weight
HF:     language_model.lm_head.weight
Custom: language_model.lm_head.weight

For MoE layers in the LLM:
HF:     language_model.backbone.layers.{N}.mixer.experts.{E}.up_proj.weight   (split per-expert)
HF:     language_model.backbone.layers.{N}.mixer.experts.{E}.down_proj.weight
Custom: language_model.model.layers.{N}.mixer.experts.gate_and_up_projs       (merged)
Custom: language_model.model.layers.{N}.mixer.experts.down_projs

Module Contents#

Classes#

NemotronOmniStateDictAdapter

State dict adapter for NemotronOmni (NemotronH_Nano_Omni_Reasoning_V3) models.

Data#

API#

nemo_automodel.components.models.nemotron_omni.state_dict_adapter.logger#

β€˜getLogger(…)’

nemo_automodel.components.models.nemotron_omni.state_dict_adapter._VISION_PROJ_HF_TO_CUSTOM#

None

nemo_automodel.components.models.nemotron_omni.state_dict_adapter._VISION_PROJ_CUSTOM_TO_HF#

None

class nemo_automodel.components.models.nemotron_omni.state_dict_adapter.NemotronOmniStateDictAdapter(
config,
llm_config,
moe_config: nemo_automodel.components.moe.config.MoEConfig,
backend: nemo_automodel.components.models.common.BackendConfig,
dtype: torch.dtype = torch.bfloat16,
)#

Bases: nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

State dict adapter for NemotronOmni (NemotronH_Nano_Omni_Reasoning_V3) models.

Handles conversion between HF checkpoint format and custom Automodel format.

The adapter delegates LLM key conversion to NemotronV3StateDictAdapter (which handles backbone->model renaming, norm_f->norm, embeddings->embed_tokens, and MoE expert merging) and handles vision/audio components directly.

Initialization

Initialize the state dict adapter.

Parameters:
  • config – Top-level NemotronOmni config

  • llm_config – LLM sub-config (NemotronHConfig)

  • moe_config – MoE configuration

  • backend – Backend configuration

  • dtype – Target dtype

from_hf(
hf_state_dict: dict[str, Any],
device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
**kwargs,
) dict[str, Any]#

Convert HF checkpoint state dict to custom Automodel format.

Steps:

  1. Separate HF state dict into: vision_model, mlp1, sound_encoder, sound_projection, language_model components

  2. Convert vision projector keys (mlp1.* -> vision_projector.*)

  3. Convert sound encoder keys (sound_encoder.encoder.* -> sound_encoder.*)

  4. Pass language_model keys through NemotronV3StateDictAdapter

  5. Merge everything back

Parameters:
  • hf_state_dict – HuggingFace format state dict

  • device_mesh – Optional device mesh for distributed expert loading

  • **kwargs – Additional arguments

Returns:

Custom format state dict

to_hf(
state_dict: dict[str, Any],
exclude_key_regex: Optional[str] = None,
**kwargs,
) dict[str, Any]#

Convert custom Automodel state dict to HF format.

Steps:

  1. Separate state dict into components

  2. Convert vision projector keys back (vision_projector.* -> mlp1.*)

  3. Convert sound encoder keys back (sound_encoder.* -> sound_encoder.encoder.*)

  4. Pass LLM keys through NemotronV3StateDictAdapter.to_hf

  5. Merge everything back

Parameters:
  • state_dict – Custom format state dict

  • exclude_key_regex – Optional regex pattern to exclude keys

  • **kwargs – Additional arguments

Returns:

HuggingFace format state dict

convert_single_tensor_to_hf(
fqn: str,
tensor: Any,
**kwargs,
) list[tuple[str, Any]]#

Convert a single tensor from custom format to HF format.

Parameters:
  • fqn – Fully qualified name of the tensor

  • tensor – The tensor to convert

  • **kwargs – Additional arguments

Returns:

List of (fqn, tensor) tuples in HF format