nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd#
Module Contents#
Functions#
Generate cu_seqlens for causal masking on compressed KV positions. |
|
Forward interface matching GLM-5’s API but for a single batch element. |
|
Batched forward: loops over batch dim. |
API#
- nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.tl_indexer_fwd_impl(
- heads,
- index_dim,
- block_N=256,
- num_stages=3,
- threads=512,
- block_Q=None,
- nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.clean_logits_(threads: int = 512, block_K: int = 4096)#
- nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd._make_causal_cu_seqlens(seq_len_q, seq_len_kv, compress_ratio, device)#
Generate cu_seqlens for causal masking on compressed KV positions.
For query at position p, valid compressed groups are [0, (p+1) // compress_ratio).
- nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.indexer_fwd_interface(
- q,
- kv,
- weights,
- cu_seqlen_ks,
- cu_seqlen_ke,
- clean_logits=True,
Forward interface matching GLM-5’s API but for a single batch element.
- Parameters:
q – [seq_len, heads, index_dim] bf16
kv – [seq_len_kv, index_dim] bf16
weights – [seq_len, heads] fp32
cu_seqlen_ks – [seq_len] int32 — start of valid KV range per query
cu_seqlen_ke – [seq_len] int32 — end of valid KV range per query
- Returns:
[seq_len, seq_len_kv] fp32
- Return type:
logits
- nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.batched_indexer_fwd(q, k, weights, cu_seqlen_ks, cu_seqlen_ke)#
Batched forward: loops over batch dim.
- Parameters:
q – [seqlen, batch, heads, dim] bf16
k – [seqlen_kv, batch, dim] bf16
weights – [seqlen, batch, heads] fp32
cu_seqlen_ks – [seqlen] int32
cu_seqlen_ke – [seqlen] int32
- Returns:
[batch, seqlen, seqlen_kv] fp32
- Return type:
logits