nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd
nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd
Module Contents
Functions
API
Forward interface for V4 sparse MQA attention.
Parameters:
q
[B, S, H, D] bf16
kv
[B, S_kv, D] bf16
attn_sink
[H] fp32
topk_idxs
[B, S, topk] int32
sm_scale
float or None (defaults to 1/sqrt(D))
Returns:
[B, S, H, D] bf16