> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.models.glm_moe_dsa.cp

Context-parallel helpers for GLM MoE DSA TileLang attention.

## Module Contents

### Functions

| Name                                                                                                                            | Description                                                                          |
| ------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| [`_contiguous_cp_indices`](#nemo_automodel-components-models-glm_moe_dsa-cp-_contiguous_cp_indices)                             | -                                                                                    |
| [`_slice_thd_chunk_for_cp`](#nemo_automodel-components-models-glm_moe_dsa-cp-_slice_thd_chunk_for_cp)                           | -                                                                                    |
| [`glm_dsa_cp_all_gather`](#nemo_automodel-components-models-glm_moe_dsa-cp-glm_dsa_cp_all_gather)                               | All-gather activation tensors across CP ranks while preserving autograd.             |
| [`glm_dsa_cp_enabled`](#nemo_automodel-components-models-glm_moe_dsa-cp-glm_dsa_cp_enabled)                                     | Return whether a real GLM DSA CP process group is active.                            |
| [`make_glm_dsa_packed_cp_batch_and_ctx`](#nemo_automodel-components-models-glm_moe_dsa-cp-make_glm_dsa_packed_cp_batch_and_ctx) | Convert packed GLM DSA batches to THD and keep a contiguous query shard per CP rank. |

### API

```python
nemo_automodel.components.models.glm_moe_dsa.cp._contiguous_cp_indices(
    total_tokens: int,
    cp_size: int,
    cp_rank: int,
    device: torch.device
) -> torch.Tensor
```

```python
nemo_automodel.components.models.glm_moe_dsa.cp._slice_thd_chunk_for_cp(
    chunk: dict[str, torch.Tensor],
    cp_group,
    cp_size: int,
    cp_rank: int,
    padding_token_id: int
) -> dict[str, torch.Tensor]
```

```python
nemo_automodel.components.models.glm_moe_dsa.cp.glm_dsa_cp_all_gather(
    tensor: torch.Tensor,
    dim: int,
    cp_group
) -> torch.Tensor
```

All-gather activation tensors across CP ranks while preserving autograd.

```python
nemo_automodel.components.models.glm_moe_dsa.cp.glm_dsa_cp_enabled(
    cp_group
) -> bool
```

Return whether a real GLM DSA CP process group is active.

```python
nemo_automodel.components.models.glm_moe_dsa.cp.make_glm_dsa_packed_cp_batch_and_ctx(
    cp_mesh,
    tp_mesh,
    batch,
    loss_mask = None,
    padding_token_id: int = 0,
    num_chunks: int = 1,
    seq_lens_padding_value: int = -1000
)
```

Convert packed GLM DSA batches to THD and keep a contiguous query shard per CP rank.

GLM DSA sparse attention gathers K/V activations inside the model. The batch
side only slices local query tokens and carries the full packed-sequence
`cu_seqlens` plus per-query global token indices for TileLang's causal
top-k window.