nemo_automodel.components.models.glm_moe_dsa.cp
nemo_automodel.components.models.glm_moe_dsa.cp
Context-parallel helpers for GLM MoE DSA TileLang attention.
Module Contents
Functions
API
All-gather activation tensors across CP ranks while preserving autograd.
Return whether a real GLM DSA CP process group is active.
Convert packed GLM DSA batches to THD and keep a contiguous query shard per CP rank.
GLM DSA sparse attention gathers K/V activations inside the model. The batch
side only slices local query tokens and carries the full packed-sequence
cu_seqlens plus per-query global token indices for TileLang’s causal
top-k window.