> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.qwen3_5_moe.model

Qwen3.5-MoE (VL) NeMo Automodel support.

## Module Contents

### Classes

| Name                                                                                                                                     | Description                                                                   |
| ---------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| [`Fp32SafeQwen3_5MoeTextRotaryEmbedding`](#nemo_automodel-components-models-qwen3_5_moe-model-Fp32SafeQwen3_5MoeTextRotaryEmbedding)     | Ensure inv\_freq stays in float32 across `.to(dtype)` calls.                  |
| [`Fp32SafeQwen3_5MoeVisionRotaryEmbedding`](#nemo_automodel-components-models-qwen3_5_moe-model-Fp32SafeQwen3_5MoeVisionRotaryEmbedding) | Ensure the vision rotary inv\_freq buffer remains float32.                    |
| [`Qwen3_5MoeBlock`](#nemo_automodel-components-models-qwen3_5_moe-model-Qwen3_5MoeBlock)                                                 | Block that uses the Qwen3.5-MoE native GatedDeltaNet (separate in\_proj\_qkv, |
| [`Qwen3_5MoeCausalLMOutputWithPast`](#nemo_automodel-components-models-qwen3_5_moe-model-Qwen3_5MoeCausalLMOutputWithPast)               | Qwen3.5-MoE output extended with MTP auxiliary hidden states.                 |
| [`Qwen3_5MoeForConditionalGeneration`](#nemo_automodel-components-models-qwen3_5_moe-model-Qwen3_5MoeForConditionalGeneration)           | Qwen3.5-MoE VL conditional generation model using NeMo backend components.    |
| [`Qwen3_5MoeMTPSublayer`](#nemo_automodel-components-models-qwen3_5_moe-model-Qwen3_5MoeMTPSublayer)                                     | One full-attention Qwen3.5-MoE MTP sublayer.                                  |
| [`Qwen3_5MoeModel`](#nemo_automodel-components-models-qwen3_5_moe-model-Qwen3_5MoeModel)                                                 | Thin wrapper that exposes `language_model` internals as properties            |
| [`Qwen3_5MoeTextModelBackend`](#nemo_automodel-components-models-qwen3_5_moe-model-Qwen3_5MoeTextModelBackend)                           | Qwen3.5-MoE text decoder rebuilt on top of the Qwen3-Next Block.              |

### Functions

| Name                                                                                                                     | Description                                                       |
| ------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------- |
| [`_default_init_device`](#nemo_automodel-components-models-qwen3_5_moe-model-_default_init_device)                       | -                                                                 |
| [`_freqs_cis_from_rotary`](#nemo_automodel-components-models-qwen3_5_moe-model-_freqs_cis_from_rotary)                   | -                                                                 |
| [`_make_missing`](#nemo_automodel-components-models-qwen3_5_moe-model-_make_missing)                                     | -                                                                 |
| [`_make_mtp_block_config`](#nemo_automodel-components-models-qwen3_5_moe-model-_make_mtp_block_config)                   | -                                                                 |
| [`_qwen3_5_moe_backend`](#nemo_automodel-components-models-qwen3_5_moe-model-_qwen3_5_moe_backend)                       | Return a Qwen3.5-MoE backend with TE fused RoPE disabled.         |
| [`_resolve_mtp_num_layers`](#nemo_automodel-components-models-qwen3_5_moe-model-_resolve_mtp_num_layers)                 | -                                                                 |
| [`_rolled_embed_inputs`](#nemo_automodel-components-models-qwen3_5_moe-model-_rolled_embed_inputs)                       | -                                                                 |
| [`_split_qwen3_5_moe_position_ids`](#nemo_automodel-components-models-qwen3_5_moe-model-_split_qwen3_5_moe_position_ids) | -                                                                 |
| [`build_mtp_config_from_hf`](#nemo_automodel-components-models-qwen3_5_moe-model-build_mtp_config_from_hf)               | Build Qwen3.5-MoE MTP runtime config from HF-style config fields. |
| [`build_qwen3_5_moe_mtp`](#nemo_automodel-components-models-qwen3_5_moe-model-build_qwen3_5_moe_mtp)                     | Construct Qwen3.5-MoE MTP blocks.                                 |

### Data

[`ModelClass`](#nemo_automodel-components-models-qwen3_5_moe-model-ModelClass)

[`_QWEN3_5_MOE_HF_AVAILABLE`](#nemo_automodel-components-models-qwen3_5_moe-model-_QWEN3_5_MOE_HF_AVAILABLE)

### API

```python
class nemo_automodel.components.models.qwen3_5_moe.model.Fp32SafeQwen3_5MoeTextRotaryEmbedding()
```

**Bases:** `Qwen3_5MoeTextRotaryEmbedding`

Ensure inv\_freq stays in float32 across `.to(dtype)` calls.

```python
nemo_automodel.components.models.qwen3_5_moe.model.Fp32SafeQwen3_5MoeTextRotaryEmbedding._apply(
    fn: typing.Any,
    recurse: bool = True
)
```

```python
class nemo_automodel.components.models.qwen3_5_moe.model.Fp32SafeQwen3_5MoeVisionRotaryEmbedding()
```

**Bases:** `Qwen3_5MoeVisionRotaryEmbedding`

Ensure the vision rotary inv\_freq buffer remains float32.

```python
nemo_automodel.components.models.qwen3_5_moe.model.Fp32SafeQwen3_5MoeVisionRotaryEmbedding._apply(
    fn: typing.Any,
    recurse: bool = True
)
```

```python
class nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeBlock(
    layer_idx,
    config,
    moe_config,
    backend
)
```

**Bases:** [Block](/nemo-automodel/nemo_automodel/components/models/qwen3_next/model#nemo_automodel-components-models-qwen3_next-model-Block)

Block that uses the Qwen3.5-MoE native GatedDeltaNet (separate in\_proj\_qkv,
in\_proj\_z, in\_proj\_b, in\_proj\_a)

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeBlock.forward(
    x: torch.Tensor,
    freqs_cis: torch.Tensor,
    attention_mask: torch.Tensor | None = None,
    padding_mask: torch.Tensor | None = None,
    position_ids: torch.Tensor | None = None,
    attn_kwargs: typing.Any = {}
) -> torch.Tensor
```

Mirror :meth:`Block.forward` but thread NEAT-packing kwargs into
`CPAwareGatedDeltaNet`.

The parent `Block.forward` calls `linear_attn` with only
`hidden_states` and `attention_mask`; for packed sequences the
gated\_delta\_rule kernel additionally needs `cu_seqlens` /
`indices` to reset state at document boundaries (issue #2131).
Derived once per forward from the indexed attention mask.

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeBlock.init_weights(
    buffer_device: torch.device
)
```

```python
class nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeCausalLMOutputWithPast(
    mtp_per_depth_h: list[torch.Tensor] | None = None,
    mtp_loss_scaling_factor: float | None = None
)
```

Dataclass

**Bases:** `CausalLMOutputWithPast`

Qwen3.5-MoE output extended with MTP auxiliary hidden states.

```python
class nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeForConditionalGeneration(
    config: transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeConfig,
    moe_config: nemo_automodel.components.moe.layers.MoEConfig | None = None,
    backend: nemo_automodel.components.models.common.BackendConfig | None = None,
    mtp_loss_scaling_factor: float = 0.1,
    num_nextn_predict_layers: int | None = None,
    kwargs = {}
)
```

**Bases:** [HFCheckpointingMixin](/nemo-automodel/nemo_automodel/components/models/common/hf_checkpointing_mixin#nemo_automodel-components-models-common-hf_checkpointing_mixin-HFCheckpointingMixin), `HFQwen3_5MoeForConditionalGeneration`, [MoEFSDPSyncMixin](/nemo-automodel/nemo_automodel/components/moe/fsdp_mixin#nemo_automodel-components-moe-fsdp_mixin-MoEFSDPSyncMixin)

Qwen3.5-MoE VL conditional generation model using NeMo backend components.

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeForConditionalGeneration.forward(
    input_ids: torch.Tensor | None = None,
    position_ids: torch.Tensor | None = None,
    attention_mask: torch.Tensor | None = None,
    padding_mask: torch.Tensor | None = None,
    inputs_embeds: torch.Tensor | None = None,
    cache_position: torch.Tensor | None = None,
    logits_to_keep: typing.Union[int, torch.Tensor] = 0,
    output_hidden_states: typing.Optional[bool] = None,
    kwargs: typing.Any = {}
)
```

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeForConditionalGeneration.from_config(
    config: transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeConfig,
    moe_config: nemo_automodel.components.moe.layers.MoEConfig | None = None,
    backend: nemo_automodel.components.models.common.BackendConfig | None = None,
    kwargs = {}
)
```

classmethod

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeForConditionalGeneration.from_pretrained(
    pretrained_model_name_or_path: str,
    model_args = (),
    kwargs = {}
)
```

classmethod

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeForConditionalGeneration.initialize_weights(
    buffer_device: torch.device | None = None,
    dtype: torch.dtype = torch.bfloat16
) -> None
```

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeForConditionalGeneration.prepare_model_inputs_for_cp(
    input_ids: torch.Tensor,
    attention_mask: torch.Tensor | None = None,
    position_ids: torch.Tensor | None = None,
    pixel_values: torch.Tensor | None = None,
    pixel_values_videos: torch.Tensor | None = None,
    image_grid_thw: torch.Tensor | None = None,
    image_grid_hws: torch.Tensor | None = None,
    video_grid_thw: torch.Tensor | None = None,
    mm_token_type_ids: torch.Tensor | None = None,
    kwargs: typing.Any = {}
) -> dict[str, torch.Tensor]
```

Build full-sequence multimodal embeddings and mRoPE positions before CP sharding.

```python
class nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeMTPSublayer(
    layer_idx: int,
    config: transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig,
    moe_config: nemo_automodel.components.moe.layers.MoEConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    has_fusion: bool = False,
    has_final_norm: bool = False,
    dtype: torch.dtype = torch.bfloat16
)
```

**Bases:** [Qwen3\_5MoeBlock](#nemo_automodel-components-models-qwen3_5_moe-model-Qwen3_5MoeBlock)

One full-attention Qwen3.5-MoE MTP sublayer.

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeMTPSublayer.forward(
    hidden_states: torch.Tensor,
    embed_input: torch.Tensor | None = None,
    rotary_emb: torch.nn.Module,
    position_ids: torch.Tensor,
    attention_mask: torch.Tensor | None = None,
    padding_mask: torch.Tensor | None = None,
    attn_kwargs: typing.Any = {}
) -> torch.Tensor
```

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeMTPSublayer.init_weights(
    buffer_device: torch.device
) -> None
```

```python
class nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeModel()
```

**Bases:** `HFQwen3_5MoeModel`

Thin wrapper that exposes `language_model` internals as properties
expected by the NeMo training loop (e.g. `model.layers`).

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeModel.forward(
    input_ids = None,
    attention_mask = None,
    position_ids = None,
    past_key_values = None,
    inputs_embeds = None,
    pixel_values = None,
    pixel_values_videos = None,
    image_grid_thw = None,
    video_grid_thw = None,
    cache_position = None,
    kwargs = {}
)
```

```python
class nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeTextModelBackend(
    config: transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    moe_config: nemo_automodel.components.moe.layers.MoEConfig | None = None,
    moe_overrides: dict | None = None
)
```

**Bases:** `Module`

Qwen3.5-MoE text decoder rebuilt on top of the Qwen3-Next Block.

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeTextModelBackend.forward(
    input_ids: torch.Tensor | None = None,
    inputs_embeds: torch.Tensor | None = None,
    attention_mask: torch.Tensor | None = None,
    position_ids: torch.Tensor | None = None,
    cache_position: torch.Tensor | None = None,
    padding_mask: torch.Tensor | None = None,
    past_key_values: typing.Any | None = None,
    use_cache: bool | None = None,
    attn_kwargs: typing.Any = {}
) -> transformers.models.qwen3_5_moe.modeling_qwen3_5_moe.Qwen3_5MoeModelOutputWithPast
```

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeTextModelBackend.get_input_embeddings() -> torch.nn.Module
```

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeTextModelBackend.init_weights(
    buffer_device: torch.device | None = None
) -> None
```

```python
nemo_automodel.components.models.qwen3_5_moe.model.Qwen3_5MoeTextModelBackend.set_input_embeddings(
    value: torch.nn.Module
) -> None
```

```python
nemo_automodel.components.models.qwen3_5_moe.model._default_init_device() -> torch.device
```

```python
nemo_automodel.components.models.qwen3_5_moe.model._freqs_cis_from_rotary(
    rotary_emb: torch.nn.Module,
    hidden_states: torch.Tensor,
    position_ids: torch.Tensor
) -> torch.Tensor
```

```python
nemo_automodel.components.models.qwen3_5_moe.model._make_missing(
    name: str
)
```

```python
nemo_automodel.components.models.qwen3_5_moe.model._make_mtp_block_config(
    config: transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig,
    layer_idx: int
) -> transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig
```

```python
nemo_automodel.components.models.qwen3_5_moe.model._qwen3_5_moe_backend(
    backend: nemo_automodel.components.models.common.BackendConfig | None = None
) -> nemo_automodel.components.models.common.BackendConfig
```

Return a Qwen3.5-MoE backend with TE fused RoPE disabled.

The Qwen3.5 full-attention blocks reuse Qwen3-Next attention, and VLM/packed
execution can present THD-shaped q/k tensors. TE fused RoPE expects 4D inputs
in this path, so use non-fused RoPE while preserving the rest of the backend.

```python
nemo_automodel.components.models.qwen3_5_moe.model._resolve_mtp_num_layers(
    config: typing.Any,
    override: int | None = None
) -> int
```

```python
nemo_automodel.components.models.qwen3_5_moe.model._rolled_embed_inputs(
    inputs_embeds: torch.Tensor,
    num_depths: int
) -> tuple[torch.Tensor, ...]
```

```python
nemo_automodel.components.models.qwen3_5_moe.model._split_qwen3_5_moe_position_ids(
    position_ids: torch.Tensor | None,
    batch_size: int,
    seq_len: int,
    device: torch.device,
    cache_position: torch.Tensor | None = None
) -> torch.Tensor
```

```python
nemo_automodel.components.models.qwen3_5_moe.model.build_mtp_config_from_hf(
    config: typing.Any,
    loss_scaling_factor: float = 0.1,
    num_nextn_predict_layers: int | None = None
) -> nemo_automodel.components.models.common.mtp.MTPConfig
```

Build Qwen3.5-MoE MTP runtime config from HF-style config fields.

```python
nemo_automodel.components.models.qwen3_5_moe.model.build_qwen3_5_moe_mtp(
    config: transformers.models.qwen3_5_moe.configuration_qwen3_5_moe.Qwen3_5MoeTextConfig,
    mtp_config: nemo_automodel.components.models.common.mtp.MTPConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    moe_config: nemo_automodel.components.moe.layers.MoEConfig,
    dtype: torch.dtype
) -> nemo_automodel.components.models.common.mtp.MTPModule
```

Construct Qwen3.5-MoE MTP blocks.

```python
nemo_automodel.components.models.qwen3_5_moe.model.ModelClass = Qwen3_5MoeForConditionalGeneration
```

```python
nemo_automodel.components.models.qwen3_5_moe.model._QWEN3_5_MOE_HF_AVAILABLE = True
```