nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd#

Module Contents#

Functions#

sparse_mqa_fwd

sparse_mqa_fwd_interface

Forward interface for V4 sparse MQA attention.

API#

nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd.sparse_mqa_fwd(
heads,
dim,
topk,
sm_scale=None,
block_I=64,
num_stages=2,
threads=256,
)#
nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd.sparse_mqa_fwd_interface(
q,
kv,
attn_sink,
topk_idxs,
sm_scale=None,
block_I=64,
num_stages=2,
threads=256,
)#

Forward interface for V4 sparse MQA attention.

Parameters:
  • q – [B, S, H, D] bf16

  • kv – [B, S_kv, D] bf16

  • attn_sink – [H] fp32

  • topk_idxs – [B, S, topk] int32

  • sm_scale – float or None (defaults to 1/sqrt(D))

Returns:

[B, S, H, D] bf16 lse: [B, S, H] fp32

Return type:

out