`nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter`#

Module Contents#

Classes#

Qwen3VLMoeStateDictAdapter

Converts between HF Qwen3-VL checkpoints and grouped-experts native format. Qwen3-VL HF have aggregated expert weights across all experts.

API#

class nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter.Qwen3VLMoeStateDictAdapter( config: Any, moe_config: nemo_automodel.components.moe.config.MoEConfig, backend: nemo_automodel.components.models.common.BackendConfig, dtype: torch.dtype = torch.float32, )#

Bases: nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

Converts between HF Qwen3-VL checkpoints and grouped-experts native format. Qwen3-VL HF have aggregated expert weights across all experts.

Initialization

to_hf(

state_dict: dict[str, Any],

exclude_key_regex: Optional[str] = None,

quantization: bool = False,

**kwargs,

) → dict[str, Any]#

from_hf(

hf_state_dict: dict[str, Any],

device_mesh: Optional[torch.distributed.device_mesh.DeviceMesh] = None,

**kwargs,

) → dict[str, Any]#

convert_single_tensor_to_hf(

fqn: str,

tensor: Any,

**kwargs,

) → list[tuple[str, Any]]#: Convert a single native tensor back to the aggregated HF format.

nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter#

Module Contents#

Classes#

API#

`nemo_automodel.components.models.qwen3_vl_moe.state_dict_adapter`#