nemo_automodel.components.models.nemotron_omni.state_dict_adapter
nemo_automodel.components.models.nemotron_omni.state_dict_adapter
State dict adapter for NemotronOmni (NemotronH_Nano_Omni_Reasoning_V3) models.
Converts between HuggingFace checkpoint format and the custom Automodel format.
HF checkpoint key structure (from model.safetensors.index.json):
Vision encoder (RADIO) — loaded as-is into self.vision_model
vision_model.radio_model.model.blocks.{N}.{…} vision_model.radio_model.input_conditioner.norm_mean vision_model.radio_model.input_conditioner.norm_std vision_model.radio_model.model.patch_generator.{…}
Vision projector — loaded into self.vision_projector
HF: mlp1.0.weight (RMSNorm) Custom: vision_projector.norm.weight HF: mlp1.1.weight (Linear1) Custom: vision_projector.linear1.weight HF: mlp1.3.weight (Linear2) Custom: vision_projector.linear2.weight
Sound encoder (Parakeet) — loaded into self.sound_encoder
HF: sound_encoder.encoder.{…} Custom: sound_encoder.{…}
Sound projector — loaded into self.sound_projection
HF: sound_projection.norm.weight Custom: sound_projection.norm.weight HF: sound_projection.linear1.weight Custom: sound_projection.linear1.weight HF: sound_projection.linear2.weight Custom: sound_projection.linear2.weight
LLM (NemotronH) — uses nemotron_v3 state_dict_adapter internally
HF: language_model.backbone.embeddings.weight Custom: language_model.model.embed_tokens.weight HF: language_model.backbone.layers.{N}.{…} Custom: language_model.model.layers.{N}.{…} HF: language_model.backbone.norm_f.weight Custom: language_model.model.norm.weight HF: language_model.lm_head.weight Custom: language_model.lm_head.weight
For MoE layers in the LLM: HF: language_model.backbone.layers.{N}.mixer.experts.{E}.up_proj.weight (split per-expert) HF: language_model.backbone.layers.{N}.mixer.experts.{E}.down_proj.weight Custom: language_model.model.layers.{N}.mixer.experts.gate_and_up_projs (merged) Custom: language_model.model.layers.{N}.mixer.experts.down_projs
Module Contents
Classes
Data
API
Bases: StateDictAdapter
State dict adapter for NemotronOmni (NemotronH_Nano_Omni_Reasoning_V3) models.
Handles conversion between HF checkpoint format and custom Automodel format.
The adapter delegates LLM key conversion to NemotronV3StateDictAdapter (which handles backbone->model renaming, norm_f->norm, embeddings->embed_tokens, and MoE expert merging) and handles vision/audio components directly.
Convert a single tensor from custom format to HF format.
Parameters:
Fully qualified name of the tensor
The tensor to convert
Additional arguments
Returns: list[tuple[str, Any]]
List of (fqn, tensor) tuples in HF format
Convert HF checkpoint state dict to custom Automodel format.
Steps:
- Separate HF state dict into: vision_model, mlp1, sound_encoder, sound_projection, language_model components
- Convert vision projector keys (mlp1.* -> vision_projector.*)
- Convert sound encoder keys (sound_encoder.encoder.* -> sound_encoder.*)
- Pass language_model keys through NemotronV3StateDictAdapter
- Merge everything back
Parameters:
HuggingFace format state dict
Optional device mesh for distributed expert loading
Additional arguments
Returns: dict[str, Any]
Custom format state dict
Convert custom Automodel state dict to HF format.
Steps:
- Separate state dict into components
- Convert vision projector keys back (vision_projector.* -> mlp1.*)
- Convert sound encoder keys back (sound_encoder.* -> sound_encoder.encoder.*)
- Pass LLM keys through NemotronV3StateDictAdapter.to_hf
- Merge everything back
Parameters:
Custom format state dict
Optional regex pattern to exclude keys
Additional arguments
Returns: dict[str, Any]
HuggingFace format state dict