nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter#
Module Contents#
Classes#
Converts between HF Qwen3-VL-MoE checkpoints and grouped-experts native format. |
API#
- class nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter.Qwen3VLMoeStateDictAdapter(
- config: Any,
- moe_config: nemo_automodel.components.moe.config.MoEConfig,
- backend: nemo_automodel.components.models.common.BackendConfig,
- dtype: torch.dtype = torch.float32,
Bases:
nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapterConverts between HF Qwen3-VL-MoE checkpoints and grouped-experts native format.
HF checkpoint keys (already stacked, no .weight suffix): model.language_model.layers.{L}.mlp.experts.gate_up_proj [n_experts, dim, 2*inter] model.language_model.layers.{L}.mlp.experts.down_proj [n_experts, inter, dim]
Native format (identical shapes, different key names): model.language_model.layers.{L}.mlp.experts.gate_and_up_projs model.language_model.layers.{L}.mlp.experts.down_projs
Loading paths: DCP path: to_hf renames native→HF, DCP loads into DTensors, from_hf renames HF→native. Tensors are DTensors throughout — just rename keys, no tensor ops. Init path: from_hf receives plain tensors from safetensors, slices to local EP shard, and wraps in DTensor via create_dtensor_from_local.
Initialization
- to_hf(
- state_dict: dict[str, Any],
- exclude_key_regex: Optional[str] = None,
- quantization: bool = False,
- **kwargs,
Rename native keys to HF keys. Tensors passed through as-is (no comms).
- from_hf(
- hf_state_dict: dict[str, Any],
- device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
- **kwargs,
Rename HF keys to native keys.
DTensors (DCP path): just rename, no tensor ops. Plain tensors (init path): slice to local EP shard, create DTensor.
- convert_single_tensor_to_hf(
- fqn: str,
- tensor: Any,
- **kwargs,
Rename a single native key to HF format. Tensor passed through as-is.