> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.qwen3_vl_moe.model

## Module Contents

### Classes

| Name                                                                                                                                      | Description                                                                   |
| ----------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| [`Fp32SafeQwen3VLMoeTextRotaryEmbedding`](#nemo_automodel-components-models-qwen3_vl_moe-model-Fp32SafeQwen3VLMoeTextRotaryEmbedding)     | Ensure inv\_freq stays in float32                                             |
| [`Fp32SafeQwen3VLMoeVisionRotaryEmbedding`](#nemo_automodel-components-models-qwen3_vl_moe-model-Fp32SafeQwen3VLMoeVisionRotaryEmbedding) | Ensure the vision rotary inv\_freq buffer remains float32.                    |
| [`Qwen3VLMoeBlock`](#nemo_automodel-components-models-qwen3_vl_moe-model-Qwen3VLMoeBlock)                                                 | Qwen3-VL block adapter that accepts HF-style position embeddings.             |
| [`Qwen3VLMoeForConditionalGeneration`](#nemo_automodel-components-models-qwen3_vl_moe-model-Qwen3VLMoeForConditionalGeneration)           | Qwen3-VL conditional generation model using the Qwen3-MoE backend components. |
| [`Qwen3VLMoeModel`](#nemo_automodel-components-models-qwen3_vl_moe-model-Qwen3VLMoeModel)                                                 | -                                                                             |
| [`Qwen3VLMoeTextModelBackend`](#nemo_automodel-components-models-qwen3_vl_moe-model-Qwen3VLMoeTextModelBackend)                           | Qwen3-VL text decoder rebuilt on top of the Qwen3-MoE block implementation.   |

### Data

[`ModelClass`](#nemo_automodel-components-models-qwen3_vl_moe-model-ModelClass)

### API

```python
class nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeTextRotaryEmbedding()
```

**Bases:** `Qwen3VLMoeTextRotaryEmbedding`

Ensure inv\_freq stays in float32

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeTextRotaryEmbedding._apply(
    fn: typing.Any,
    recurse: bool = True
)
```

```python
class nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeVisionRotaryEmbedding()
```

**Bases:** `Qwen3VLMoeVisionRotaryEmbedding`

Ensure the vision rotary inv\_freq buffer remains float32.

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Fp32SafeQwen3VLMoeVisionRotaryEmbedding._apply(
    fn: typing.Any,
    recurse: bool = True
)
```

```python
class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeBlock()
```

**Bases:** [Block](/nemo-automodel/nemo_automodel/components/models/qwen3_moe/model#nemo_automodel-components-models-qwen3_moe-model-Block)

Qwen3-VL block adapter that accepts HF-style position embeddings.

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeBlock.forward(
    x: torch.Tensor,
    freqs_cis: torch.Tensor | None = None,
    attention_mask: torch.Tensor | None = None,
    padding_mask: torch.Tensor | None = None,
    position_embeddings: tuple[torch.Tensor, torch.Tensor] | None = None,
    attn_kwargs: typing.Any = {}
) -> torch.Tensor
```

```python
class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration(
    config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeConfig,
    moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
    backend: nemo_automodel.components.models.common.BackendConfig | None = None,
    kwargs = {}
)
```

**Bases:** [HFCheckpointingMixin](/nemo-automodel/nemo_automodel/components/models/common/hf_checkpointing_mixin#nemo_automodel-components-models-common-hf_checkpointing_mixin-HFCheckpointingMixin), `HFQwen3VLMoeForConditionalGeneration`, [MoEFSDPSyncMixin](/nemo-automodel/nemo_automodel/components/moe/fsdp_mixin#nemo_automodel-components-moe-fsdp_mixin-MoEFSDPSyncMixin)

Qwen3-VL conditional generation model using the Qwen3-MoE backend components.

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.forward(
    input_ids: torch.Tensor | None = None,
    position_ids: torch.Tensor | None = None,
    attention_mask: torch.Tensor | None = None,
    padding_mask: torch.Tensor | None = None,
    inputs_embeds: torch.Tensor | None = None,
    cache_position: torch.Tensor | None = None,
    logits_to_keep: int | torch.Tensor = 0,
    output_hidden_states: bool | None = None,
    kwargs: typing.Any = {}
)
```

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.from_config(
    config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeConfig,
    moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
    backend: nemo_automodel.components.models.common.BackendConfig | None = None,
    kwargs = {}
)
```

classmethod

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.from_pretrained(
    pretrained_model_name_or_path: str,
    model_args = (),
    kwargs = {}
)
```

classmethod

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.get_input_embeddings()
```

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.get_output_embeddings()
```

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.initialize_weights(
    buffer_device: torch.device | None = None,
    dtype: torch.dtype = torch.bfloat16
) -> None
```

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.set_input_embeddings(
    value
)
```

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeForConditionalGeneration.set_output_embeddings(
    new_embeddings
)
```

```python
class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeModel()
```

**Bases:** `HFQwen3VLMoeModel`

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeModel.forward(
    input_ids = None,
    attention_mask = None,
    position_ids = None,
    past_key_values = None,
    inputs_embeds = None,
    pixel_values = None,
    pixel_values_videos = None,
    image_grid_thw = None,
    video_grid_thw = None,
    cache_position = None,
    kwargs = {}
)
```

```python
class nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend(
    config: transformers.models.qwen3_vl_moe.configuration_qwen3_vl_moe.Qwen3VLMoeTextConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    moe_config: nemo_automodel.components.moe.config.MoEConfig | None = None,
    moe_overrides: dict | None = None
)
```

**Bases:** `Module`

Qwen3-VL text decoder rebuilt on top of the Qwen3-MoE block implementation.

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend._deepstack_process(
    hidden_states: torch.Tensor,
    visual_pos_masks: torch.Tensor | None,
    visual_embeds: torch.Tensor
) -> torch.Tensor
```

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend.forward(
    input_ids: torch.Tensor | None = None,
    inputs_embeds: torch.Tensor | None = None,
    attention_mask: torch.Tensor | None = None,
    position_ids: torch.Tensor | None = None,
    cache_position: torch.Tensor | None = None,
    visual_pos_masks: torch.Tensor | None = None,
    deepstack_visual_embeds: list[torch.Tensor] | None = None,
    padding_mask: torch.Tensor | None = None,
    past_key_values: typing.Any | None = None,
    use_cache: bool | None = None,
    attn_kwargs: typing.Any = {}
) -> transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe.Qwen3VLMoeModelOutputWithPast
```

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend.get_input_embeddings() -> torch.nn.Module
```

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend.init_weights(
    buffer_device: torch.device | None = None
) -> None
```

```python
nemo_automodel.components.models.qwen3_vl_moe.model.Qwen3VLMoeTextModelBackend.set_input_embeddings(
    value: torch.nn.Module
) -> None
```

```python
nemo_automodel.components.models.qwen3_vl_moe.model.ModelClass = Qwen3VLMoeForConditionalGeneration
```