nemo_automodel.components.models.deepseek_v4.kernels

View as Markdown

Vendored DeepSeek V4 TileLang kernels.

Miles DeepSeek V4 kernels

The vendored sparse attention and indexer kernels were adapted from the Miles DeepSeek V4 implementation:

Per-file source mapping:

=============================== ============================================================== Local file Upstream file =============================== ============================================================== sparse_attention.py miles_plugins/models/deepseek_v4/ops/attention_core.py tilelang_indexer.py miles_plugins/models/deepseek_v4/ops/kernel/tilelang_indexer.py tilelang_indexer_bwd.py miles_plugins/models/deepseek_v4/ops/kernel/tilelang_indexer_bwd.py tilelang_indexer_fwd.py miles_plugins/models/deepseek_v4/ops/kernel/tilelang_indexer_fwd.py tilelang_sparse_mla_bwd.py miles_plugins/models/deepseek_v4/ops/kernel/tilelang_sparse_mla_bwd.py tilelang_sparse_mla_fwd.py miles_plugins/models/deepseek_v4/ops/kernel/tilelang_sparse_mla_fwd.py =============================== ==============================================================

Local modifications include adapting the kernels to AutoModel’s DeepSeek V4 tensor layouts, packed-sequence dispatch, optional backend selection, and forward/backward parity tests against the torch reference implementation.

DeepSeek TileKernels

The Sinkhorn optimized path imports DeepSeek TileKernels at runtime. AutoModel does not vendor TileKernels source code.

Submodules