nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer#

TileLang-based DSA Indexer for DeepSeek-V4.

Adapts GLM-5’s lighting_indexer to V4’s SBHD data layout and causal masking. Provides both a low-level per-sample interface and a batched autograd Function.

Module Contents#

Classes#

V4IndexerFunction

Autograd function for V4 tilelang indexer.

Functions#

pytorch_extract_topk_scores

v4_lighting_indexer

Main entry point for V4 tilelang indexer.

API#

nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer.pytorch_extract_topk_scores(logits, topk_indices, dim=-1)#
class nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer.V4IndexerFunction#

Bases: torch.autograd.Function

Autograd function for V4 tilelang indexer.

Inputs are in V4’s native SBHD layout: q: [seqlen, batch, heads, dim] bf16 k: [seqlen_kv, batch, dim] bf16 weights: [seqlen, batch, heads] fp32

static forward(
ctx,
index_q: torch.Tensor,
index_k: torch.Tensor,
weights: torch.Tensor,
compress_ratio: int,
topk: int,
topk_indices: torch.Tensor | None = None,
)#
static backward(ctx, grad_scores, grad_indices)#
nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer.v4_lighting_indexer(
index_q: torch.Tensor,
index_k: torch.Tensor,
weights: torch.Tensor,
compress_ratio: int,
topk: int,
topk_indices: torch.Tensor | None = None,
)#

Main entry point for V4 tilelang indexer.

Parameters:
  • index_q – [seqlen, batch, heads, dim] bf16

  • index_k – [seqlen_kv, batch, dim] bf16

  • weights – [seqlen, batch, heads] fp32

  • compress_ratio – compression ratio (4 for C4 layers)

  • topk – number of top-k indices to select

  • topk_indices – optional pre-computed topk indices [batch, seqlen, topk] int32

Returns:

[batch, seqlen, topk] fp32 topk_indices: [batch, seqlen, topk] int32

Return type:

index_score