nemo_automodel.components.models.qwen3_next.state_dict_adapter#

Module Contents#

Classes#

Qwen3NextStateDictAdapter

Converts between HF Qwen3Next checkpoints and our grouped-experts native format.

Data#

API#

nemo_automodel.components.models.qwen3_next.state_dict_adapter.logger#

‘getLogger(…)’

class nemo_automodel.components.models.qwen3_next.state_dict_adapter.Qwen3NextStateDictAdapter(
config: Any,
moe_config: nemo_automodel.components.moe.layers.MoEConfig,
backend: nemo_automodel.components.moe.utils.BackendConfig,
dtype: torch.dtype = torch.float32,
)#

Bases: nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin, nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

Converts between HF Qwen3Next checkpoints and our grouped-experts native format.

Qwen3Next HF experts use keys: model.layers.{L}.mlp.experts.{E}.gate_proj.weight model.layers.{L}.mlp.experts.{E}.up_proj.weight model.layers.{L}.mlp.experts.{E}.down_proj.weight

Our native format groups them into: model.layers.{L}.mlp.experts.gate_and_up_projs # [n_experts, dim, 2*moe_inter_dim] model.layers.{L}.mlp.experts.down_projs # [n_experts, moe_inter_dim, dim]

Qwen3Next HF shared experts use keys: model.layers.{L}.mlp.shared_expert.gate_proj.weight model.layers.{L}.mlp.shared_expert.up_proj.weight model.layers.{L}.mlp.shared_expert.down_proj.weight

Our native format uses: model.layers.{L}.mlp.shared_experts.gate_proj.weight # Note: plural “shared_experts” model.layers.{L}.mlp.shared_experts.up_proj.weight model.layers.{L}.mlp.shared_experts.down_proj.weight

Initialization

_apply_key_mapping(
state_dict: dict[str, Any],
mapping: dict[str, str],
) dict[str, Any]#

Apply key substring mappings to state dict keys.

Parameters:
  • state_dict – State dict to apply mappings to

  • mapping – Dictionary mapping substrings to replace them with

Returns:

New state dict with mapped keys

to_hf(
state_dict: dict[str, Any],
exclude_key_regex: Optional[str] = None,
quantization: bool = False,
**kwargs,
) dict[str, Any]#
from_hf(
hf_state_dict: dict[str, Any],
device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
**kwargs,
) dict[str, Any]#
convert_single_tensor_to_hf(
fqn: str,
tensor: Any,
**kwargs,
) list[tuple[str, Any]]#

Convert a single tensor from native format to HuggingFace format.

Parameters:
  • fqn – Fully qualified name of the tensor in native format

  • tensor – The tensor to convert

  • **kwargs – Additional arguments for conversion

Returns:

List of (fqn, tensor) tuples in HuggingFace format