> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd

## Module Contents

### Functions

| Name                                                                                                                                 | Description                                    |
| ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- |
| [`sparse_mqa_fwd`](#nemo_automodel-components-models-deepseek_v4-kernels-tilelang_sparse_mla_fwd-sparse_mqa_fwd)                     | -                                              |
| [`sparse_mqa_fwd_interface`](#nemo_automodel-components-models-deepseek_v4-kernels-tilelang_sparse_mla_fwd-sparse_mqa_fwd_interface) | Forward interface for V4 sparse MQA attention. |

### API

```python
nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd.sparse_mqa_fwd(
    heads,
    dim,
    topk,
    sm_scale = None,
    block_I = 64,
    num_stages = 2,
    threads = 256
)
```

```python
nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd.sparse_mqa_fwd_interface(
    q,
    kv,
    attn_sink,
    topk_idxs,
    sm_scale = None,
    block_I = 64,
    num_stages = 2,
    threads = 256
)
```

Forward interface for V4 sparse MQA attention.

**Parameters:**

\[B, S, H, D] bf16

\[B, S\_kv, D] bf16

\[H] fp32

\[B, S, topk] int32

float or None (defaults to 1/sqrt(D))

**Returns:**

\[B, S, H, D] bf16