nemo_automodel.components.models.nemotron_omni.state_dict_adapter#
State dict adapter for NemotronOmni (NemotronH_Nano_Omni_Reasoning_V3) models.
Converts between HuggingFace checkpoint format and the custom Automodel format.
HF checkpoint key structure (from model.safetensors.index.json): # Vision encoder (RADIO) β loaded as-is into self.vision_model vision_model.radio_model.model.blocks.{N}.{β¦} vision_model.radio_model.input_conditioner.norm_mean vision_model.radio_model.input_conditioner.norm_std vision_model.radio_model.model.patch_generator.{β¦}
# Vision projector -- loaded into self.vision_projector
HF: mlp1.0.weight (RMSNorm)
Custom: vision_projector.norm.weight
HF: mlp1.1.weight (Linear1)
Custom: vision_projector.linear1.weight
HF: mlp1.3.weight (Linear2)
Custom: vision_projector.linear2.weight
# Sound encoder (Parakeet) -- loaded into self.sound_encoder
HF: sound_encoder.encoder.{...}
Custom: sound_encoder.{...}
# Sound projector -- loaded into self.sound_projection
HF: sound_projection.norm.weight
Custom: sound_projection.norm.weight
HF: sound_projection.linear1.weight
Custom: sound_projection.linear1.weight
HF: sound_projection.linear2.weight
Custom: sound_projection.linear2.weight
# LLM (NemotronH) -- uses nemotron_v3 state_dict_adapter internally
HF: language_model.backbone.embeddings.weight
Custom: language_model.model.embed_tokens.weight
HF: language_model.backbone.layers.{N}.{...}
Custom: language_model.model.layers.{N}.{...}
HF: language_model.backbone.norm_f.weight
Custom: language_model.model.norm.weight
HF: language_model.lm_head.weight
Custom: language_model.lm_head.weight
For MoE layers in the LLM:
HF: language_model.backbone.layers.{N}.mixer.experts.{E}.up_proj.weight (split per-expert)
HF: language_model.backbone.layers.{N}.mixer.experts.{E}.down_proj.weight
Custom: language_model.model.layers.{N}.mixer.experts.gate_and_up_projs (merged)
Custom: language_model.model.layers.{N}.mixer.experts.down_projs
Module Contents#
Classes#
State dict adapter for NemotronOmni (NemotronH_Nano_Omni_Reasoning_V3) models. |
Data#
API#
- nemo_automodel.components.models.nemotron_omni.state_dict_adapter.logger#
βgetLogger(β¦)β
- nemo_automodel.components.models.nemotron_omni.state_dict_adapter._VISION_PROJ_HF_TO_CUSTOM#
None
- nemo_automodel.components.models.nemotron_omni.state_dict_adapter._VISION_PROJ_CUSTOM_TO_HF#
None
- class nemo_automodel.components.models.nemotron_omni.state_dict_adapter.NemotronOmniStateDictAdapter(
- config,
- llm_config,
- moe_config: nemo_automodel.components.moe.config.MoEConfig,
- backend: nemo_automodel.components.models.common.BackendConfig,
- dtype: torch.dtype = torch.bfloat16,
Bases:
nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapterState dict adapter for NemotronOmni (NemotronH_Nano_Omni_Reasoning_V3) models.
Handles conversion between HF checkpoint format and custom Automodel format.
The adapter delegates LLM key conversion to NemotronV3StateDictAdapter (which handles backbone->model renaming, norm_f->norm, embeddings->embed_tokens, and MoE expert merging) and handles vision/audio components directly.
Initialization
Initialize the state dict adapter.
- Parameters:
config β Top-level NemotronOmni config
llm_config β LLM sub-config (NemotronHConfig)
moe_config β MoE configuration
backend β Backend configuration
dtype β Target dtype
- from_hf(
- hf_state_dict: dict[str, Any],
- device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
- **kwargs,
Convert HF checkpoint state dict to custom Automodel format.
Steps:
Separate HF state dict into: vision_model, mlp1, sound_encoder, sound_projection, language_model components
Convert vision projector keys (mlp1.* -> vision_projector.*)
Convert sound encoder keys (sound_encoder.encoder.* -> sound_encoder.*)
Pass language_model keys through NemotronV3StateDictAdapter
Merge everything back
- Parameters:
hf_state_dict β HuggingFace format state dict
device_mesh β Optional device mesh for distributed expert loading
**kwargs β Additional arguments
- Returns:
Custom format state dict
- to_hf(
- state_dict: dict[str, Any],
- exclude_key_regex: Optional[str] = None,
- **kwargs,
Convert custom Automodel state dict to HF format.
Steps:
Separate state dict into components
Convert vision projector keys back (vision_projector.* -> mlp1.*)
Convert sound encoder keys back (sound_encoder.* -> sound_encoder.encoder.*)
Pass LLM keys through NemotronV3StateDictAdapter.to_hf
Merge everything back
- Parameters:
state_dict β Custom format state dict
exclude_key_regex β Optional regex pattern to exclude keys
**kwargs β Additional arguments
- Returns:
HuggingFace format state dict
- convert_single_tensor_to_hf(
- fqn: str,
- tensor: Any,
- **kwargs,
Convert a single tensor from custom format to HF format.
- Parameters:
fqn β Fully qualified name of the tensor
tensor β The tensor to convert
**kwargs β Additional arguments
- Returns:
List of (fqn, tensor) tuples in HF format