nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd#
Module Contents#
Functions#
Forward interface for V4 sparse MQA attention. |
API#
- nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd.sparse_mqa_fwd(
- heads,
- dim,
- topk,
- sm_scale=None,
- block_I=64,
- num_stages=2,
- threads=256,
- nemo_automodel.components.models.deepseek_v4.kernels.tilelang_sparse_mla_fwd.sparse_mqa_fwd_interface(
- q,
- kv,
- attn_sink,
- topk_idxs,
- sm_scale=None,
- block_I=64,
- num_stages=2,
- threads=256,
Forward interface for V4 sparse MQA attention.
- Parameters:
q – [B, S, H, D] bf16
kv – [B, S_kv, D] bf16
attn_sink – [H] fp32
topk_idxs – [B, S, topk] int32
sm_scale – float or None (defaults to 1/sqrt(D))
- Returns:
[B, S, H, D] bf16 lse: [B, S, H] fp32
- Return type:
out