nemo_automodel.components.models.qwen3_vl_moe.model
nemo_automodel.components.models.qwen3_vl_moe.model
Module Contents
Classes
| Name | Description |
|---|---|
Fp32SafeQwen3VLMoeTextRotaryEmbedding | Ensure inv_freq stays in float32 |
Fp32SafeQwen3VLMoeVisionRotaryEmbedding | Ensure the vision rotary inv_freq buffer remains float32. |
Qwen3VLMoeBlock | Qwen3-VL block adapter that accepts HF-style position embeddings. |
Qwen3VLMoeForConditionalGeneration | Qwen3-VL conditional generation model using the Qwen3-MoE backend components. |
Qwen3VLMoeModel | - |
Qwen3VLMoeTextModelBackend | Qwen3-VL text decoder rebuilt on top of the Qwen3-MoE block implementation. |
Data
API
class nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeTextRotaryEmbedding()
Bases: Qwen3VLMoeTextRotaryEmbedding
Ensure inv_freq stays in float32
nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeTextRotaryEmbedding._apply( fn: typing.Any, recurse: bool = True )
class nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeVisionRotaryEmbedding()
Bases: Qwen3VLMoeVisionRotaryEmbedding
Ensure the vision rotary inv_freq buffer remains float32.
nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeVisionRotaryEmbedding._apply( fn: typing.Any, recurse: bool = True )
class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeBlock()
Bases: Block
Qwen3-VL block adapter that accepts HF-style position embeddings.
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeBlock.forward( x: torch.Tensor, freqs_cis: torch.Tensor | None = None, attention_mask: torch.Tensor | None = None, padding_mask: torch.Tensor | None = None, position_embeddings: tuple[torch.Tensor, torch.Tensor] | None = None, attn_kwargs: typing.Any = {} ) -> torch.Tensor
class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration( config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeConfig, moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None, backend: nemo_automodel.components.models.common.BackendConfig | None = None, kwargs = {} )
Bases: HFCheckpointingMixin, HFQwen3VLMoeForConditionalGeneration, MoEFSDPSyncMixin
Qwen3-VL conditional generation model using the Qwen3-MoE backend components.
_pp_keep_self_forward
bool = True
lm_head
pad_token_id
= pad_token_id if pad_token_id is not None else -1
state_dict_adapter
vocab_size
= text_config.vocab_size
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.forward( input_ids: torch.Tensor | None = None, position_ids: torch.Tensor | None = None, attention_mask: torch.Tensor | None = None, padding_mask: torch.Tensor | None = None, inputs_embeds: torch.Tensor | None = None, cache_position: torch.Tensor | None = None, logits_to_keep: int | torch.Tensor = 0, output_hidden_states: bool | None = None, kwargs: typing.Any = {} )
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.from_config( config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeConfig, moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None, backend: nemo_automodel.components.models.common.BackendConfig | None = None, kwargs = {} )
classmethod
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.from_pretrained( pretrained_model_name_or_path: str, model_args = (), kwargs = {} )
classmethod
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.get_input_embeddings()
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.get_output_embeddings()
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.initialize_weights( buffer_device: torch.device | None = None, dtype: torch.dtype = torch.bfloat16 ) -> None
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.set_input_embeddings( value )
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.set_output_embeddings( new_embeddings )
class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeModel()
Bases: HFQwen3VLMoeModel
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeModel.forward( input_ids = None, attention_mask = None, position_ids = None, past_key_values = None, inputs_embeds = None, pixel_values = None, pixel_values_videos = None, image_grid_thw = None, video_grid_thw = None, cache_position = None, kwargs = {} )
class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend( config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeTextConfig, backend: nemo_automodel.components.models.common.BackendConfig, moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None, moe_overrides: dict | None = None )
Bases: Module
Qwen3-VL text decoder rebuilt on top of the Qwen3-MoE block implementation.
embed_tokens
layers
moe_config
= moe_config or MoEConfig(**moe_defaults)
norm
padding_idx
= getattr(config, 'pad_token_id', None)
rotary_emb
vocab_size
= config.vocab_size
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend._deepstack_process( hidden_states: torch.Tensor, visual_pos_masks: torch.Tensor | None, visual_embeds: torch.Tensor ) -> torch.Tensor
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend.forward( input_ids: torch.Tensor | None = None, inputs_embeds: torch.Tensor | None = None, attention_mask: torch.Tensor | None = None, position_ids: torch.Tensor | None = None, cache_position: torch.Tensor | None = None, visual_pos_masks: torch.Tensor | None = None, deepstack_visual_embeds: list[torch.Tensor] | None = None, padding_mask: torch.Tensor | None = None, past_key_values: typing.Any | None = None, use_cache: bool | None = None, attn_kwargs: typing.Any = {} ) -> transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeModelOutputWithPast
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend.get_input_embeddings() -> torch.nn.Module
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend.init_weights( buffer_device: torch.device | None = None ) -> None
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend.set_input_embeddings( value: torch.nn.Module ) -> None
nemo_automodel.components.models.qwen3_vl_moe.model.ModelClass = Qwen3VLMoeForConditionalGeneration