nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter#
Module Contents#
Classes#
Converts between HF Qwen3-VL checkpoints and grouped-experts native format. Qwen3-VL HF have aggregated expert weights across all experts. |
API#
- class nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter.Qwen3VLMoeStateDictAdapter(
- config: Any,
- moe_config: nemo_automodel.components.moe.layers.MoEConfig,
- backend: nemo_automodel.components.moe.utils.BackendConfig,
- dtype: torch.dtype = torch.float32,
Bases:
nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapterConverts between HF Qwen3-VL checkpoints and grouped-experts native format. Qwen3-VL HF have aggregated expert weights across all experts.
Initialization
- to_hf(
- state_dict: dict[str, Any],
- exclude_key_regex: Optional[str] = None,
- quantization: bool = False,
- **kwargs,
- from_hf(
- hf_state_dict: dict[str, Any],
- device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,
- **kwargs,
- convert_single_tensor_to_hf(
- fqn: str,
- tensor: Any,
- **kwargs,
Convert a single native tensor back to the aggregated HF format.