nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd

View as Markdown

Module Contents

Functions

NameDescription
sparse_mqa_fwd-
sparse_mqa_fwd_interfaceForward interface for V4 sparse MQA attention.

API

nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd.sparse_mqa_fwd(
heads,
dim,
topk,
sm_scale = None,
block_I = 64,
num_stages = 2,
threads = 256
)
nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd.sparse_mqa_fwd_interface(
q,
kv,
attn_sink,
topk_idxs,
sm_scale = None,
block_I = 64,
num_stages = 2,
threads = 256
)

Forward interface for V4 sparse MQA attention.

Parameters:

q

[B, S, H, D] bf16

kv

[B, S_kv, D] bf16

attn_sink

[H] fp32

topk_idxs

[B, S, topk] int32

sm_scale
Defaults to None

float or None (defaults to 1/sqrt(D))

Returns:

[B, S, H, D] bf16