> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention

Autograd wrapper for vendored Miles DeepSeek V4 sparse-attention kernels.

Attribution:

* Upstream project: Miles, [https://github.com/yueming-yuan/miles](https://github.com/yueming-yuan/miles)
* Upstream revision: e561465d0b9bbf06188b7a5e2020dc7fd691f732, deepseek-v4 branch
* Upstream license: Apache-2.0, copyright 2025 Zhipu AI
* Original source:
  [https://github.com/yueming-yuan/miles/blob/e561465d0b9bbf06188b7a5e2020dc7fd691f732/miles\_plugins/models/deepseek\_v4/ops/attention\_core.py](https://github.com/yueming-yuan/miles/blob/e561465d0b9bbf06188b7a5e2020dc7fd691f732/miles_plugins/models/deepseek_v4/ops/attention_core.py)

## Module Contents

### Classes

| Name                                                                                                                                                  | Description                                                                       |
| ----------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
| [`DeepSeekV4SparseAttention`](#nemo_automodel-components-models-deepseek_v4-kernels-sparse_attention-DeepSeekV4SparseAttention)                       | TileLang sparse MQA attention with custom backward.                               |
| [`DeepSeekV4SparseAttentionHeadChunked`](#nemo_automodel-components-models-deepseek_v4-kernels-sparse_attention-DeepSeekV4SparseAttentionHeadChunked) | TileLang sparse attention with smaller head groups and fp32 KV-grad accumulation. |

### Functions

| Name                                                                                                                                            | Description                                                  |
| ----------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------ |
| [`sparse_attn_tilelang`](#nemo_automodel-components-models-deepseek_v4-kernels-sparse_attention-sparse_attn_tilelang)                           | Run vendored Miles DeepSeek V4 TileLang sparse attention.    |
| [`sparse_attn_tilelang_head_chunked`](#nemo_automodel-components-models-deepseek_v4-kernels-sparse_attention-sparse_attn_tilelang_head_chunked) | Run vendored Miles sparse attention in TileLang head chunks. |

### API

```python
class nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention.DeepSeekV4SparseAttention()
```

**Bases:** `Function`

TileLang sparse MQA attention with custom backward.

```python
nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention.DeepSeekV4SparseAttention.backward(
    ctx,
    grad_output: torch.Tensor
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, None, None]
```

staticmethod

Run the vendored sparse attention backward kernel.

```python
nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention.DeepSeekV4SparseAttention.forward(
    ctx,
    q: torch.Tensor,
    kv: torch.Tensor,
    attn_sink: torch.Tensor,
    topk_idxs: torch.Tensor,
    sm_scale: float | None = None
) -> torch.Tensor
```

staticmethod

Run the vendored sparse attention forward kernel.

```python
class nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention.DeepSeekV4SparseAttentionHeadChunked()
```

**Bases:** `Function`

TileLang sparse attention with smaller head groups and fp32 KV-grad accumulation.

```python
nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention.DeepSeekV4SparseAttentionHeadChunked.backward(
    ctx,
    grad_output: torch.Tensor
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, None, None, None]
```

staticmethod

Run chunked backward and accumulate shared KV gradients in fp32.

```python
nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention.DeepSeekV4SparseAttentionHeadChunked.forward(
    ctx,
    q: torch.Tensor,
    kv: torch.Tensor,
    attn_sink: torch.Tensor,
    topk_idxs: torch.Tensor,
    max_heads_per_kernel: int,
    sm_scale: float | None = None
) -> torch.Tensor
```

staticmethod

Run the vendored sparse attention forward kernel over head chunks.

```python
nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention.sparse_attn_tilelang(
    q: torch.Tensor,
    kv: torch.Tensor,
    attn_sink: torch.Tensor,
    topk_idxs: torch.Tensor,
    sm_scale: float | None = None
) -> torch.Tensor
```

Run vendored Miles DeepSeek V4 TileLang sparse attention.

```python
nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention.sparse_attn_tilelang_head_chunked(
    q: torch.Tensor,
    kv: torch.Tensor,
    attn_sink: torch.Tensor,
    topk_idxs: torch.Tensor,
    max_heads_per_kernel: int,
    sm_scale: float | None = None
) -> torch.Tensor
```

Run vendored Miles sparse attention in TileLang head chunks.