nemo_automodel.components.models.qwen3_vl_moe.model#
Module Contents#
Classes#
Ensure inv_freq stays in float32 |
|
Ensure the vision rotary inv_freq buffer remains float32. |
|
Qwen3-VL block adapter that accepts HF-style position embeddings. |
|
Qwen3-VL text decoder rebuilt on top of the Qwen3-MoE block implementation. |
|
Qwen3-VL conditional generation model using the Qwen3-MoE backend components. |
Data#
API#
- class nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeTextRotaryEmbedding#
Bases:
transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeTextRotaryEmbeddingEnsure inv_freq stays in float32
- _apply(fn: Any, recurse: bool = True)#
- class nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeVisionRotaryEmbedding#
Bases:
transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeVisionRotaryEmbeddingEnsure the vision rotary inv_freq buffer remains float32.
- _apply(fn: Any, recurse: bool = True)#
- class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeBlock(
- layer_idx: int,
- config: transformers.models.qwen3_moe.configuration_qwen3_moe.Qwen3MoeConfig,
- moe_config: nemo_automodel.components.moe.config.MoEConfig,
- backend: nemo_automodel.components.models.common.BackendConfig,
Bases:
nemo_automodel.components.models.qwen3_moe.model.BlockQwen3-VL block adapter that accepts HF-style position embeddings.
Initialization
- forward(
- x: torch.Tensor,
- *,
- freqs_cis: torch.Tensor | None = None,
- attention_mask: torch.Tensor | None = None,
- padding_mask: torch.Tensor | None = None,
- position_embeddings: tuple[torch.Tensor, torch.Tensor] | None = None,
- **attn_kwargs: Any,
- class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeModel#
Bases:
transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeModel- property layers#
- property embed_tokens#
- property norm#
- forward(
- input_ids=None,
- attention_mask=None,
- position_ids=None,
- past_key_values=None,
- inputs_embeds=None,
- pixel_values=None,
- pixel_values_videos=None,
- image_grid_thw=None,
- video_grid_thw=None,
- cache_position=None,
- **kwargs,
- class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend(
- config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeTextConfig,
- backend: nemo_automodel.components.models.common.BackendConfig,
- *,
- moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
Bases:
torch.nn.ModuleQwen3-VL text decoder rebuilt on top of the Qwen3-MoE block implementation.
Initialization
- forward(
- input_ids: torch.Tensor | None = None,
- *,
- inputs_embeds: torch.Tensor | None = None,
- attention_mask: torch.Tensor | None = None,
- position_ids: torch.Tensor | None = None,
- cache_position: torch.Tensor | None = None,
- visual_pos_masks: torch.Tensor | None = None,
- deepstack_visual_embeds: list[torch.Tensor] | None = None,
- padding_mask: torch.Tensor | None = None,
- past_key_values: Any | None = None,
- use_cache: bool | None = None,
- **attn_kwargs: Any,
- _deepstack_process(
- hidden_states: torch.Tensor,
- visual_pos_masks: torch.Tensor | None,
- visual_embeds: torch.Tensor,
- get_input_embeddings() torch.nn.Module#
- set_input_embeddings(value: torch.nn.Module) None#
- init_weights(buffer_device: torch.device | None = None) None#
- class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration(
- config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeConfig,
- moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
- backend: nemo_automodel.components.models.common.BackendConfig | None = None,
- **kwargs,
Bases:
nemo_automodel.components.models.common.hf_checkpointing_mixin.HFCheckpointingMixin,transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeForConditionalGeneration,nemo_automodel.components.moe.fsdp_mixin.MoEFSDPSyncMixinQwen3-VL conditional generation model using the Qwen3-MoE backend components.
Initialization
- classmethod from_config(
- config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeConfig,
- moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
- backend: nemo_automodel.components.models.common.BackendConfig | None = None,
- **kwargs,
- classmethod from_pretrained(
- pretrained_model_name_or_path: str,
- *model_args,
- **kwargs,
- get_input_embeddings()#
- set_input_embeddings(value)#
- get_output_embeddings()#
- set_output_embeddings(new_embeddings)#
- forward(
- input_ids: torch.Tensor | None = None,
- *,
- position_ids: torch.Tensor | None = None,
- attention_mask: torch.Tensor | None = None,
- padding_mask: torch.Tensor | None = None,
- inputs_embeds: torch.Tensor | None = None,
- cache_position: torch.Tensor | None = None,
- **kwargs: Any,
- initialize_weights(
- buffer_device: torch.device | None = None,
- dtype: torch.dtype = torch.bfloat16,
- nemo_automodel.components.models.qwen3_vl_moe.model.ModelClass#
None