> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.qwen3_next.layers

## Module Contents

### Classes

| Name                                                                                                           | Description                                                               |
| -------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| [`Qwen3NextAttention`](#nemo_automodel-components-models-qwen3_next-layers-Qwen3NextAttention)                 | -                                                                         |
| [`Qwen3NextFp32GatedDeltaNet`](#nemo_automodel-components-models-qwen3_next-layers-Qwen3NextFp32GatedDeltaNet) | Qwen3-Next GatedDeltaNet that computes the decay gate via an fp32 holder. |
| [`Qwen3NextRMSNorm`](#nemo_automodel-components-models-qwen3_next-layers-Qwen3NextRMSNorm)                     | -                                                                         |
| [`Qwen3NextSSMGate`](#nemo_automodel-components-models-qwen3_next-layers-Qwen3NextSSMGate)                     | Owns Qwen3-Next fp32 SSM-gating params and computes the decay gate.       |
| [`_SSMGateParam`](#nemo_automodel-components-models-qwen3_next-layers-_SSMGateParam)                           | Get-only descriptor exposing a param from `_fp32_params` when present.    |

### Functions

| Name                                                                                         | Description                                                       |
| -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- |
| [`_install_ssm_gate`](#nemo_automodel-components-models-qwen3_next-layers-_install_ssm_gate) | Move HF-created bare `A_log`/`dt_bias` into a native fp32 holder. |

### API

```python
class nemo_automodel.components.models.qwen3_next.layers.Qwen3NextAttention(
    config: transformers.models.qwen3_next.configuration_qwen3_next.Qwen3NextConfig,
    layer_idx: int,
    backend: nemo_automodel.components.models.common.BackendConfig
)
```

**Bases:** `Module`

```python
nemo_automodel.components.models.qwen3_next.layers.Qwen3NextAttention.forward(
    x: torch.Tensor,
    freqs_cis: torch.Tensor,
    attention_mask: torch.Tensor | None = None,
    attn_kwargs: typing.Any = {}
) -> torch.Tensor
```

```python
nemo_automodel.components.models.qwen3_next.layers.Qwen3NextAttention.init_weights(
    buffer_device: torch.device,
    init_std: float = 0.02
)
```

```python
class nemo_automodel.components.models.qwen3_next.layers.Qwen3NextFp32GatedDeltaNet(
    config: transformers.models.qwen3_next.configuration_qwen3_next.Qwen3NextConfig,
    layer_idx: int
)
```

**Bases:** `Qwen3NextGatedDeltaNet`

Qwen3-Next GatedDeltaNet that computes the decay gate via an fp32 holder.

HF's `Qwen3NextGatedDeltaNet` computes the gate inline as
`g = -exp(A_log) * softplus(a + dt_bias)` using the bare `A_log` / `dt_bias`
parameters. `A_log` and `dt_bias` are intrinsically fp32 (`A_log` is
exponentiated, so bf16 rounding becomes a proportional error on the decay rate that
the recurrence compounds across the sequence).

The constructor moves those params into a native `_fp32_params` holder so they
are fp32 resident before any dtype cast or FSDP wrapping. To keep the gate
computation in fp32 -- and to make FSDP's unshard/reshard + gradient
reduce-scatter fire for that unit -- the gate is computed inside the holder's
forward. This subclass overrides `forward` to route the gate through
`self._compute_gate(a)` while reproducing the rest of HF's forward verbatim.

```python
nemo_automodel.components.models.qwen3_next.layers.Qwen3NextFp32GatedDeltaNet._compute_gate(
    a: torch.Tensor
) -> torch.Tensor
```

Compute the decay gate `g` in fp32, via the holder when it exists.

```python
nemo_automodel.components.models.qwen3_next.layers.Qwen3NextFp32GatedDeltaNet.forward(
    hidden_states: torch.Tensor,
    cache_params: typing.Any | None = None,
    attention_mask: torch.Tensor | None = None
)
```

```python
class nemo_automodel.components.models.qwen3_next.layers.Qwen3NextRMSNorm(
    dim: int,
    eps: float = 1e-06
)
```

**Bases:** `Module`

```python
nemo_automodel.components.models.qwen3_next.layers.Qwen3NextRMSNorm._norm(
    x
)
```

```python
nemo_automodel.components.models.qwen3_next.layers.Qwen3NextRMSNorm.extra_repr()
```

```python
nemo_automodel.components.models.qwen3_next.layers.Qwen3NextRMSNorm.forward(
    x
)
```

```python
nemo_automodel.components.models.qwen3_next.layers.Qwen3NextRMSNorm.reset_parameters()
```

```python
class nemo_automodel.components.models.qwen3_next.layers.Qwen3NextSSMGate(
    num_v_heads: int,
    dtype: torch.dtype = torch.float32
)
```

**Bases:** `Module`

Owns Qwen3-Next fp32 SSM-gating params and computes the decay gate.

```python
nemo_automodel.components.models.qwen3_next.layers.Qwen3NextSSMGate.forward(
    a: torch.Tensor
) -> torch.Tensor
```

```python
class nemo_automodel.components.models.qwen3_next.layers._SSMGateParam(
    name: str
)
```

Get-only descriptor exposing a param from `_fp32_params` when present.

```python
nemo_automodel.components.models.qwen3_next.layers._SSMGateParam.__get__(
    obj,
    owner = None
)
```

```python
nemo_automodel.components.models.qwen3_next.layers._install_ssm_gate(
    mod: torch.nn.Module,
    fp32_dtype: torch.dtype = torch.float32
) -> nemo_automodel.components.models.qwen3_next.layers.Qwen3NextSSMGate
```

Move HF-created bare `A_log`/`dt_bias` into a native fp32 holder.