nemo_automodel.components.models.deepseek_v4.optimized_kernels#
Optional DeepSeek V4 optimized kernel dispatch.
The torch implementations below are kept as the numerical reference. Optional TileLang-backed paths are sourced from:
Sinkhorn: imported from DeepSeek TileKernels
tile_kernels.modeling.mhc.ops.sinkhorn_normalize. No TileKernels source is vendored in AutoModel. Upstream source: https://github.com/deepseek-ai/TileKernels/blob/36d9e45d38e204ebb87e6f6e833821eee0482fe5/tile_kernels/modeling/mhc/ops/sinkhorn.py Upstream license: MIT, copyright 2026 DeepSeek.Sparse attention and indexer: vendored/adapted Miles DeepSeek V4 ops in
nemo_automodel.components.models.deepseek_v4.kernels. Upstream source: https://github.com/yueming-yuan/miles/tree/e561465d0b9bbf06188b7a5e2020dc7fd691f732/miles_plugins/models/deepseek_v4/ops Upstream license: Apache-2.0, copyright 2025 Zhipu AI. Seenemo_automodel/components/models/deepseek_v4/kernels/__init__.pyfor the per-file attribution.
Those packages are imported with safe_import so environments without
TileLang still import the model and use the existing torch path.
Module Contents#
Classes#
TileKernels Sinkhorn wrapper that accepts non-contiguous backward gradients. |
Functions#
Return whether the optional TileLang kernel package for |
|
Torch reference for TileKernels MHC Sinkhorn normalization. |
|
Normalize HyperConnection combination logits with torch or TileKernels. |
|
Build Miles-style top-k key indices for DSV4 local-window + compressed KV attention. |
|
Miles sparse MQA torch reference. |
|
Dense torch oracle for the Miles top-k sparse-attention contract. |
|
Run DSV4 sparse attention through Miles TileLang kernels or torch fallback. |
|
Torch reference for the Miles DSV4 C4 indexer score kernel. |
|
Extract top-k score values, masking |
|
Run DSV4 C4 indexer scores through Miles TileLang kernels or torch fallback. |
|
Run DSV4 C4 top-k indexer scores through Miles autograd kernels or torch fallback. |
Data#
API#
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.Dsv4SparseAttentionBackend#
None
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.Dsv4IndexerBackend#
None
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.Dsv4SinkhornBackend#
None
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.is_dsv4_kernel_available(
- name: Literal[sinkhorn, sparse_attn, indexer],
Return whether the optional TileLang kernel package for
nameis importable.
- nemo_automodel.components.models.deepseek_v4.optimized_kernels._all_cuda(*tensors: torch.Tensor) bool#
- nemo_automodel.components.models.deepseek_v4.optimized_kernels._should_use_tilelang(
- backend: str,
- *,
- available: bool,
- kernel_name: str,
- tensors: tuple[torch.Tensor, ...],
- require_bf16: bool = False,
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.sinkhorn_normalize_torch(
- x: torch.Tensor,
- repeat: int,
- eps: float,
Torch reference for TileKernels MHC Sinkhorn normalization.
- class nemo_automodel.components.models.deepseek_v4.optimized_kernels._Dsv4TileKernelsSinkhorn#
Bases:
torch.autograd.FunctionTileKernels Sinkhorn wrapper that accepts non-contiguous backward gradients.
The upstream high-level wrapper launches the backward kernel with
grad_outputas-is. DSV4 consumes HC combinations through transposed matmul sites, so autograd can provide a transposed gradient layout. The low-level TileKernels backward kernel requires contiguous row-major inputs.- static forward(
- ctx: torch.autograd.function.FunctionCtx,
- x: torch.Tensor,
- repeat: int,
- eps: float,
- static backward(
- ctx: torch.autograd.function.FunctionCtx,
- grad_output: torch.Tensor,
- nemo_automodel.components.models.deepseek_v4.optimized_kernels._tile_kernels_sinkhorn_contiguous_grad(
- x: torch.Tensor,
- repeat: int,
- eps: float,
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.dsv4_sinkhorn_normalize(
- x: torch.Tensor,
- *,
- backend: nemo_automodel.components.models.deepseek_v4.optimized_kernels.Dsv4SinkhornBackend,
- repeat: int,
- eps: float,
Normalize HyperConnection combination logits with torch or TileKernels.
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.build_dsv4_sparse_topk_indices(
- *,
- batch_size: int,
- seq_len: int,
- key_len: int,
- window_size: int,
- device: torch.device,
- attention_mask: torch.Tensor | None = None,
- compress_ratio: int = 0,
- compressed_topk: torch.Tensor | None = None,
- n_pooled: int = 0,
Build Miles-style top-k key indices for DSV4 local-window + compressed KV attention.
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.sparse_attention_torch(
- q: torch.Tensor,
- kv: torch.Tensor,
- sinks: torch.Tensor,
- topk_idxs: torch.Tensor,
- sm_scale: float,
Miles sparse MQA torch reference.
- Parameters:
q – Query tensor with shape
[B, S, H, D].kv – Single-head KV tensor with shape
[B, K, D].sinks – Per-head attention sink logits with shape
[H].topk_idxs – Key indices with shape
[B, S, K_top];-1masks an entry.sm_scale – Attention scaling factor.
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.dense_attention_topk_torch(
- q: torch.Tensor,
- kv: torch.Tensor,
- sinks: torch.Tensor,
- topk_idxs: torch.Tensor,
- sm_scale: float,
Dense torch oracle for the Miles top-k sparse-attention contract.
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.dsv4_sparse_attention(
- q: torch.Tensor,
- kv: torch.Tensor,
- sinks: torch.Tensor,
- topk_idxs: torch.Tensor,
- sm_scale: float,
- *,
- backend: nemo_automodel.components.models.deepseek_v4.optimized_kernels.Dsv4SparseAttentionBackend,
Run DSV4 sparse attention through Miles TileLang kernels or torch fallback.
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.indexer_scores_torch(
- q: torch.Tensor,
- pooled_kv: torch.Tensor,
- weights: torch.Tensor,
- softmax_scale: float,
Torch reference for the Miles DSV4 C4 indexer score kernel.
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.extract_indexer_topk_scores_torch(
- logits: torch.Tensor,
- topk_indices: torch.Tensor,
Extract top-k score values, masking
-1entries with-inf.
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.dsv4_indexer_scores(
- q: torch.Tensor,
- pooled_kv: torch.Tensor,
- weights: torch.Tensor,
- *,
- compress_ratio: int,
- softmax_scale: float,
- backend: nemo_automodel.components.models.deepseek_v4.optimized_kernels.Dsv4IndexerBackend,
Run DSV4 C4 indexer scores through Miles TileLang kernels or torch fallback.
- nemo_automodel.components.models.deepseek_v4.optimized_kernels.dsv4_indexer_topk_scores(
- q: torch.Tensor,
- pooled_kv: torch.Tensor,
- weights: torch.Tensor,
- topk_indices: torch.Tensor,
- *,
- compress_ratio: int,
- softmax_scale: float,
- backend: nemo_automodel.components.models.deepseek_v4.optimized_kernels.Dsv4IndexerBackend,
Run DSV4 C4 top-k indexer scores through Miles autograd kernels or torch fallback.