nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention
nemo_automodel.components.models.deepseek_v4.kernels.sparse_attention
Autograd wrapper for vendored Miles DeepSeek V4 sparse-attention kernels.
Attribution:
- Upstream project: Miles, https://github.com/yueming-yuan/miles
- Upstream revision: e561465d0b9bbf06188b7a5e2020dc7fd691f732, deepseek-v4 branch
- Upstream license: Apache-2.0, copyright 2025 Zhipu AI
- Original source: https://github.com/yueming-yuan/miles/blob/e561465d0b9bbf06188b7a5e2020dc7fd691f732/miles_plugins/models/deepseek_v4/ops/attention_core.py
Module Contents
Classes
Functions
API
Bases: Function
TileLang sparse MQA attention with custom backward.
staticmethod
Run the vendored sparse attention backward kernel.
staticmethod
Run the vendored sparse attention forward kernel.
Bases: Function
TileLang sparse attention with smaller head groups and fp32 KV-grad accumulation.
staticmethod
Run chunked backward and accumulate shared KV gradients in fp32.
staticmethod
Run the vendored sparse attention forward kernel over head chunks.
Run vendored Miles DeepSeek V4 TileLang sparse attention.
Run vendored Miles sparse attention in TileLang head chunks.