> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.attention.dflash_mask

DFlash sparse-attention masks (SDPA + FlexAttention).

Builds the DFlash block-diagonal attention masks (paper §4.2) so that
multi-anchor DFlash training (up to \~512 anchors per sequence — paper
Appendix A.1) is tractable in memory.

KV layout:  `[ context (S tokens) | block_0 | block_1 | ... | block_&#123;N-1&#125; ]`
Q  layout:  `[ block_0 | block_1 | ... | block_&#123;N-1&#125; ]`

Each query in block *b* attends to:

1. context positions strictly less than `anchor[b]` (causal-style prefix)
2. its own block's noise positions (bidirectional in-block)
3. nothing else — other blocks are invisible

The context is never queried *from* (the target LM is frozen, we only need
its hidden states), so omitting it from Q halves the attention compute vs.
including context positions in Q.

## Module Contents

### Functions

| Name                                                                                                                  | Description                                                          |
| --------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
| [`_get_compiled_create_block_mask`](#nemo_automodel-components-attention-dflash_mask-_get_compiled_create_block_mask) | Lazy-initialise a compiled `create_block_mask` and cache it.         |
| [`create_dflash_block_mask`](#nemo_automodel-components-attention-dflash_mask-create_dflash_block_mask)               | Build a sparse FlexAttention :class:`BlockMask` for DFlash training. |
| [`create_dflash_sdpa_mask`](#nemo_automodel-components-attention-dflash_mask-create_dflash_sdpa_mask)                 | Build a dense additive attention mask for the SDPA backend.          |

### Data

[`_compiled_create_block_mask`](#nemo_automodel-components-attention-dflash_mask-_compiled_create_block_mask)

### API

```python
nemo_automodel.components.attention.dflash_mask._get_compiled_create_block_mask()
```

Lazy-initialise a compiled `create_block_mask` and cache it.

```python
nemo_automodel.components.attention.dflash_mask.create_dflash_block_mask(
    anchor_positions: torch.Tensor,
    block_keep_mask: torch.Tensor,
    ctx_len: int,
    block_size: int,
    device: torch.device,
    use_compile: bool = True
) -> 'BlockMask'
```

Build a sparse FlexAttention :class:`BlockMask` for DFlash training.

See module docstring for the mask semantics. The returned `BlockMask` is
consumed directly by transformers' `flex_attention` backend when
`_attn_implementation="flex_attention"` is set on the draft model — pass
it via the `attention_mask` kwarg.

**Parameters:**

`[B, N]` anchor positions (long).

`[B, N]` valid-anchor mask (bool).

context length.

block size.

torch device.

Cache and reuse a torch.compile'd `create_block_mask`
across calls (default True). Set to False when running on PyTorch
builds that hit Inductor errors during compile.

**Returns:** `'BlockMask'`

class:`torch.nn.attention.flex_attention.BlockMask`.

```python
nemo_automodel.components.attention.dflash_mask.create_dflash_sdpa_mask(
    anchor_positions: torch.Tensor,
    block_keep_mask: torch.Tensor,
    ctx_len: int,
    block_size: int,
    device: torch.device,
    dtype: torch.dtype
) -> torch.Tensor
```

Build a dense additive attention mask for the SDPA backend.

**Parameters:**

`[B, N]` anchor positions per sample (long).

`[B, N]` per-sample valid-anchor mask (bool).

context length `S`.

block size.

torch device.

dtype for the additive mask (typically the model dtype).

**Returns:** `torch.Tensor`

`[B, 1, N*block_size, S + N*block_size]` float tensor: `0` at

```python
nemo_automodel.components.attention.dflash_mask._compiled_create_block_mask = None
```