nemo_automodel.components.models.nemotron_v3.state_dict_adapter
nemo_automodel.components.models.nemotron_v3.state_dict_adapter
Module Contents
Classes
Data
API
Bases: MoESplitExpertsStateDictMixin, StateDictAdapter
State dict adapter for NemotronV3 models.
Converts between HuggingFace checkpoint format and internal NeMo format.
HF format uses βbackboneβ prefix:
- backbone.embed_tokens.weight
- backbone.layers.{}.norm.weight
- backbone.layers.{}.mixer.* (mamba/attention/moe components)
- backbone.norm_f.weight
- lm_head.weight
Internal format uses βmodelβ prefix:
- model.embed_tokens.weight
- model.layers.{}.norm.weight
- model.layers.{}.mixer.* (mamba/attention/moe components)
- model.norm.weight
- lm_head.weight
NemotronV3 uses ReLUΒ² activation (non-gated), so gate_and_up_projs has shape [n_experts, dim, inter_dim] instead of [n_experts, dim, 2*inter_dim].
Note: NemotronV3 uses βmixerβ instead of βmlpβ in layer paths.
NemotronV3 uses βmixer.expertsβ instead of βmlp.expertsβ.
NemotronV3 HF format uses βbackbone.β prefix.
Convert a single tensor from internal format to HuggingFace format.
Parameters:
Fully qualified name of the tensor in internal format
The tensor to convert
Additional arguments for conversion
Returns: list[tuple[str, Any]]
List of (fqn, tensor) tuples in HuggingFace format
Convert HF checkpoint to internal format.
- Rename backbone β model
- Rename norm_f β norm
- Aggregate per-expert weights into grouped tensors
- If device_mesh is provided, only load experts needed for the current rank
- Process MTP keys (
mtp.layers.{i}.*) separately, reusing the same MoE expert-merge logic for the MoE sublayer of each MTP depth.
Parameters:
HuggingFace format state dict
Optional device mesh for distributed expert loading
Additional arguments
Returns: dict[str, Any]
Internal format state dict
Convert from internal model state dict to HuggingFace format.
Parameters:
Internal format state dict
Optional regex pattern to exclude keys
Additional arguments
Returns: dict[str, Any]
HuggingFace format state dict