nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer#
TileLang-based DSA Indexer for DeepSeek-V4.
Adapts GLM-5’s lighting_indexer to V4’s SBHD data layout and causal masking. Provides both a low-level per-sample interface and a batched autograd Function.
Module Contents#
Classes#
Autograd function for V4 tilelang indexer. |
Functions#
Main entry point for V4 tilelang indexer. |
API#
- nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer.pytorch_extract_topk_scores(logits, topk_indices, dim=-1)#
- class nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer.V4IndexerFunction#
Bases:
torch.autograd.FunctionAutograd function for V4 tilelang indexer.
Inputs are in V4’s native SBHD layout: q: [seqlen, batch, heads, dim] bf16 k: [seqlen_kv, batch, dim] bf16 weights: [seqlen, batch, heads] fp32
- static forward(
- ctx,
- index_q: torch.Tensor,
- index_k: torch.Tensor,
- weights: torch.Tensor,
- compress_ratio: int,
- topk: int,
- topk_indices: torch.Tensor | None = None,
- static backward(ctx, grad_scores, grad_indices)#
- nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer.v4_lighting_indexer(
- index_q: torch.Tensor,
- index_k: torch.Tensor,
- weights: torch.Tensor,
- compress_ratio: int,
- topk: int,
- topk_indices: torch.Tensor | None = None,
Main entry point for V4 tilelang indexer.
- Parameters:
index_q – [seqlen, batch, heads, dim] bf16
index_k – [seqlen_kv, batch, dim] bf16
weights – [seqlen, batch, heads] fp32
compress_ratio – compression ratio (4 for C4 layers)
topk – number of top-k indices to select
topk_indices – optional pre-computed topk indices [batch, seqlen, topk] int32
- Returns:
[batch, seqlen, topk] fp32 topk_indices: [batch, seqlen, topk] int32
- Return type:
index_score