`nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter`#

Module Contents#

Classes#

Qwen3VLMoeStateDictAdapter

Converts between HF Qwen3-VL-MoE checkpoints and grouped-experts native format.

API#

class nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter.Qwen3VLMoeStateDictAdapter( config: Any, moe_config: nemo_automodel.components.moe.config.MoEConfig, backend: nemo_automodel.components.models.common.BackendConfig, dtype: torch.dtype = torch.float32, )#

Bases: nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

Converts between HF Qwen3-VL-MoE checkpoints and grouped-experts native format.

HF checkpoint keys (already stacked, no .weight suffix): model.language_model.layers.{L}.mlp.experts.gate_up_proj [n_experts, dim, 2*inter] model.language_model.layers.{L}.mlp.experts.down_proj [n_experts, inter, dim]

Native format (identical shapes, different key names): model.language_model.layers.{L}.mlp.experts.gate_and_up_projs model.language_model.layers.{L}.mlp.experts.down_projs

Loading paths: DCP path: to_hf renames native→HF, DCP loads into DTensors, from_hf renames HF→native. Tensors are DTensors throughout — just rename keys, no tensor ops. Init path: from_hf receives plain tensors from safetensors, slices to local EP shard, and wraps in DTensor via create_dtensor_from_local.

Initialization

to_hf(

state_dict: dict[str, Any],

exclude_key_regex: Optional[str] = None,

quantization: bool = False,

**kwargs,

) → dict[str, Any]#: Rename native keys to HF keys. Tensors passed through as-is (no comms).

from_hf(

hf_state_dict: dict[str, Any],

device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,

**kwargs,

) → dict[str, Any]#

Rename HF keys to native keys.

DTensors (DCP path): just rename, no tensor ops. Plain tensors (init path): slice to local EP shard, create DTensor.

convert_single_tensor_to_hf(

fqn: str,

tensor: Any,

**kwargs,

) → list[tuple[str, Any]]#: Rename a single native key to HF format. Tensor passed through as-is.

nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter#

Module Contents#

Classes#

API#

`nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter`#