nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd

View as Markdown

Module Contents

Functions

NameDescription
_make_causal_cu_seqlensGenerate cu_seqlens for causal masking on compressed KV positions.
batched_indexer_fwdBatched forward: loops over batch dim.
clean_logits_-
indexer_fwd_interfaceForward interface matching GLM-5’s API but for a single batch element.
tl_indexer_fwd_impl-

API

nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd._make_causal_cu_seqlens(
seq_len_q,
seq_len_kv,
compress_ratio,
device
)

Generate cu_seqlens for causal masking on compressed KV positions.

For query at position p, valid compressed groups are [0, (p+1) // compress_ratio).

nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.batched_indexer_fwd(
q,
k,
weights,
cu_seqlen_ks,
cu_seqlen_ke
)

Batched forward: loops over batch dim.

Parameters:

q

[seqlen, batch, heads, dim] bf16

k

[seqlen_kv, batch, dim] bf16

weights

[seqlen, batch, heads] fp32

cu_seqlen_ks

[seqlen] int32

cu_seqlen_ke

[seqlen] int32

Returns:

[batch, seqlen, seqlen_kv] fp32

nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.clean_logits_(
threads: int = 512,
block_K: int = 4096
)
nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.indexer_fwd_interface(
q,
kv,
weights,
cu_seqlen_ks,
cu_seqlen_ke,
clean_logits = True
)

Forward interface matching GLM-5’s API but for a single batch element.

Parameters:

q

[seq_len, heads, index_dim] bf16

kv

[seq_len_kv, index_dim] bf16

weights

[seq_len, heads] fp32

cu_seqlen_ks

[seq_len] int32 — start of valid KV range per query

cu_seqlen_ke

[seq_len] int32 — end of valid KV range per query

Returns:

[seq_len, seq_len_kv] fp32

nemo_automodel.components.models.deepseek_v4.kernels.tilelang_indexer_fwd.tl_indexer_fwd_impl(
heads,
index_dim,
block_N = 256,
num_stages = 3,
threads = 512,
block_Q = None
)