`nemo_automodel.components.models.qwen3_vl_moe.model`#

Module Contents#

Classes#

`Fp32SafeQwen3VLMoeTextRotaryEmbedding`	Ensure inv_freq stays in float32
`Fp32SafeQwen3VLMoeVisionRotaryEmbedding`	Ensure the vision rotary inv_freq buffer remains float32.
`Qwen3VLMoeBlock`	Qwen3-VL block adapter that accepts HF-style position embeddings.
`Qwen3VLMoeModel`
`Qwen3VLMoeTextModelBackend`	Qwen3-VL text decoder rebuilt on top of the Qwen3-MoE block implementation.
`Qwen3VLMoeForConditionalGeneration`	Qwen3-VL conditional generation model using the Qwen3-MoE backend components.

Data#

ModelClass

API#

class nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeTextRotaryEmbedding#

Bases: transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeTextRotaryEmbedding

Ensure inv_freq stays in float32

_apply(fn: Any, recurse: bool = True)#

class nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeVisionRotaryEmbedding#

Bases: transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeVisionRotaryEmbedding

Ensure the vision rotary inv_freq buffer remains float32.

_apply(fn: Any, recurse: bool = True)#

class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeBlock( layer_idx: int, config: transformers.models.qwen3_moe.configuration_qwen3_moe.Qwen3MoeConfig, moe_config: nemo_automodel.components.moe.config.MoEConfig, backend: nemo_automodel.components.models.common.BackendConfig, )#

Bases: nemo_automodel.components.models.qwen3_moe.model.Block

Qwen3-VL block adapter that accepts HF-style position embeddings.

Initialization

forward(

x: torch.Tensor,

*,

freqs_cis: torch.Tensor | None = None,

attention_mask: torch.Tensor | None = None,

padding_mask: torch.Tensor | None = None,

position_embeddings: tuple[torch.Tensor, torch.Tensor] | None = None,

**attn_kwargs: Any,

) → torch.Tensor#

class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeModel#

Bases: transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeModel

property layers#

property embed_tokens#

property norm#

forward(

input_ids=None,

attention_mask=None,

position_ids=None,

past_key_values=None,

inputs_embeds=None,

pixel_values=None,

pixel_values_videos=None,

image_grid_thw=None,

video_grid_thw=None,

cache_position=None,

**kwargs,

)#

class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend( config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeTextConfig, backend: nemo_automodel.components.models.common.BackendConfig, *, moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None, moe_overrides: dict | None = None, )#

Bases: torch.nn.Module

Qwen3-VL text decoder rebuilt on top of the Qwen3-MoE block implementation.

Initialization

forward(

input_ids: torch.Tensor | None = None,

*,

inputs_embeds: torch.Tensor | None = None,

attention_mask: torch.Tensor | None = None,

position_ids: torch.Tensor | None = None,

cache_position: torch.Tensor | None = None,

visual_pos_masks: torch.Tensor | None = None,

deepstack_visual_embeds: list[torch.Tensor] | None = None,

padding_mask: torch.Tensor | None = None,

past_key_values: Any | None = None,

use_cache: bool | None = None,

**attn_kwargs: Any,

) → transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeModelOutputWithPast#

_deepstack_process( hidden_states: torch.Tensor, visual_pos_masks: torch.Tensor | None, visual_embeds: torch.Tensor, ) → torch.Tensor#

get_input_embeddings() → torch.nn.Module#

set_input_embeddings(value: torch.nn.Module) → None#

init_weights(buffer_device: torch.device | None = None) → None#

class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration(

config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeConfig,

moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,

backend: nemo_automodel.components.models.common.BackendConfig | None = None,

**kwargs,

)#

Bases: nemo_automodel.components.models.common.hf_checkpointing_mixin.HFCheckpointingMixin, transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeForConditionalGeneration, nemo_automodel.components.moe.fsdp_mixin.MoEFSDPSyncMixin

Qwen3-VL conditional generation model using the Qwen3-MoE backend components.

Initialization

_pp_keep_self_forward: bool#: True

classmethod from_config(

config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeConfig,

moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,

backend: nemo_automodel.components.models.common.BackendConfig | None = None,

**kwargs,

)#

classmethod from_pretrained(

pretrained_model_name_or_path: str,

*model_args,

**kwargs,

)#

get_input_embeddings()#

set_input_embeddings(value)#

get_output_embeddings()#

set_output_embeddings(new_embeddings)#

forward(

input_ids: torch.Tensor | None = None,

*,

position_ids: torch.Tensor | None = None,

attention_mask: torch.Tensor | None = None,

padding_mask: torch.Tensor | None = None,

inputs_embeds: torch.Tensor | None = None,

cache_position: torch.Tensor | None = None,

**kwargs: Any,

)#

initialize_weights( buffer_device: torch.device | None = None, dtype: torch.dtype = torch.bfloat16, ) → None#

nemo_automodel.components.models.qwen3_vl_moe.model.ModelClass#: None

nemo_automodel.components.models.qwen3_vl_moe.model#

Module Contents#

Classes#

Data#

API#

`nemo_automodel.components.models.qwen3_vl_moe.model`#