bridge.diffusion.common.dllm#

Diffusion language model utilities: masking and block attention masks.

Module Contents#

Functions#

forward_process_simple_masking

Uniform random masking for diffusion LM training.

compute_block_mask

Compute the sbd_block_diff attention mask.

API#

bridge.diffusion.common.dllm.forward_process_simple_masking(
input_ids,
mask_token_id,
eps=0.001,
loss_mask=None,
generator=None,
)#

Uniform random masking for diffusion LM training.

For each sequence in the batch, sample a masking ratio t ~ U(eps, 1) and independently mask each token with probability t.

Returns:

input_ids with masked positions replaced by mask_token_id masked_indices: boolean mask of shape (b, l) p_mask: per-token masking probability of shape (b, l)

Return type:

noisy_batch

bridge.diffusion.common.dllm.compute_block_mask(block_size, max_seq_length)#

Compute the sbd_block_diff attention mask.

The semi-block-diffusion mask is composed of three sub-masks over a doubled sequence [xt | x0] of length 2*max_seq_length:

  • Block Diagonal (M_BD): self-attention within noised blocks (xt only)

  • Offset Block-Causal (M_OBC): cross-attention from xt to past x0 blocks

  • Fully Causal (M_FC): fully causal attention within x0

Parameters:
  • block_size – Block size for block-based attention.

  • max_seq_length – Length of one half (xt or x0) of the sequence.

Returns:

BlockMask for use with flex_attention.