bridge.diffusion.common.dllm#
Diffusion language model utilities: masking and block attention masks.
Module Contents#
Functions#
Uniform random masking for diffusion LM training. |
|
Compute the sbd_block_diff attention mask. |
API#
- bridge.diffusion.common.dllm.forward_process_simple_masking(
- input_ids,
- mask_token_id,
- eps=0.001,
- loss_mask=None,
- generator=None,
Uniform random masking for diffusion LM training.
For each sequence in the batch, sample a masking ratio t ~ U(eps, 1) and independently mask each token with probability t.
- Returns:
input_ids with masked positions replaced by mask_token_id masked_indices: boolean mask of shape (b, l) p_mask: per-token masking probability of shape (b, l)
- Return type:
noisy_batch
- bridge.diffusion.common.dllm.compute_block_mask(block_size, max_seq_length)#
Compute the sbd_block_diff attention mask.
The semi-block-diffusion mask is composed of three sub-masks over a doubled sequence [xt | x0] of length 2*max_seq_length:
Block Diagonal (M_BD): self-attention within noised blocks (xt only)
Offset Block-Causal (M_OBC): cross-attention from xt to past x0 blocks
Fully Causal (M_FC): fully causal attention within x0
- Parameters:
block_size – Block size for block-based attention.
max_seq_length – Length of one half (xt or x0) of the sequence.
- Returns:
BlockMask for use with
flex_attention.