`nemo_automodel.components.models.qwen3_vl_moe.model`#

Module Contents#

`Fp32SafeQwen3VLMoeTextRotaryEmbedding`	Ensure inv_freq stays in float32
`Fp32SafeQwen3VLMoeVisionRotaryEmbedding`	Ensure the vision rotary inv_freq buffer remains float32.
`Qwen3VLMoeModel`
`Qwen3VLMoeTextModelBackend`	Qwen3-VL text decoder rebuilt on top of the Qwen3-MoE block implementation.
`Qwen3VLMoeForConditionalGeneration`	Qwen3-VL conditional generation model using the Qwen3-MoE backend components.

class nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeTextRotaryEmbedding#

Bases: transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeTextRotaryEmbedding

Ensure inv_freq stays in float32

class nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeVisionRotaryEmbedding#

Bases: transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeVisionRotaryEmbedding

Ensure the vision rotary inv_freq buffer remains float32.

class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeModel#

Bases: transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeModel

class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend( config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeTextConfig, backend: nemo_automodel.components.moe.utils.BackendConfig, *, moe_config: nemo_automodel.components.moe.layers.MoEConfig | None = None, )#

Bases: torch.nn.Module

Qwen3-VL text decoder rebuilt on top of the Qwen3-MoE block implementation.

Initialization

_deepstack_process( hidden_states: torch.Tensor, visual_pos_masks: torch.Tensor | None, visual_embeds: torch.Tensor, ) → torch.Tensor#

class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration(

config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeConfig,

moe_config: nemo_automodel.components.moe.layers.MoEConfig | None = None,

backend: nemo_automodel.components.moe.utils.BackendConfig | None = None,

**kwargs,

)#

Bases: transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeForConditionalGeneration, nemo_automodel.components.moe.fsdp_mixin.MoEFSDPSyncMixin

Qwen3-VL conditional generation model using the Qwen3-MoE backend components.

Initialization

initialize_weights( buffer_device: torch.device | None = None, dtype: torch.dtype = torch.bfloat16, ) → None#