nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd#

Module Contents#

Functions#

tl_indexer_fwd_impl

clean_logits_

_make_causal_cu_seqlens

Generate cu_seqlens for causal masking on compressed KV positions.

indexer_fwd_interface

Forward interface matching GLM-5’s API but for a single batch element.

batched_indexer_fwd

Batched forward: loops over batch dim.

API#

nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.tl_indexer_fwd_impl(
heads,
index_dim,
block_N=256,
num_stages=3,
threads=512,
block_Q=None,
)#
nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.clean_logits_(threads: int = 512, block_K: int = 4096)#
nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd._make_causal_cu_seqlens(seq_len_q, seq_len_kv, compress_ratio, device)#

Generate cu_seqlens for causal masking on compressed KV positions.

For query at position p, valid compressed groups are [0, (p+1) // compress_ratio).

nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.indexer_fwd_interface(
q,
kv,
weights,
cu_seqlen_ks,
cu_seqlen_ke,
clean_logits=True,
)#

Forward interface matching GLM-5’s API but for a single batch element.

Parameters:
  • q – [seq_len, heads, index_dim] bf16

  • kv – [seq_len_kv, index_dim] bf16

  • weights – [seq_len, heads] fp32

  • cu_seqlen_ks – [seq_len] int32 — start of valid KV range per query

  • cu_seqlen_ke – [seq_len] int32 — end of valid KV range per query

Returns:

[seq_len, seq_len_kv] fp32

Return type:

logits

nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.batched_indexer_fwd(q, k, weights, cu_seqlen_ks, cu_seqlen_ke)#

Batched forward: loops over batch dim.

Parameters:
  • q – [seqlen, batch, heads, dim] bf16

  • k – [seqlen_kv, batch, dim] bf16

  • weights – [seqlen, batch, heads] fp32

  • cu_seqlen_ks – [seqlen] int32

  • cu_seqlen_ke – [seqlen] int32

Returns:

[batch, seqlen, seqlen_kv] fp32

Return type:

logits