bridge.models.qwen_vl.modelling_qwen3_vl.attention#

Module Contents#

Classes#

Qwen3VLSelfAttention

Overrides the SelfAttention class, the difference is that qwen3vl uses apply_rotary_pos_emb_absolute instead of apply_rotary_pos_emb

API#

class bridge.models.qwen_vl.modelling_qwen3_vl.attention.Qwen3VLSelfAttention#

Bases: megatron.core.transformer.attention.SelfAttention

Overrides the SelfAttention class, the difference is that qwen3vl uses apply_rotary_pos_emb_absolute instead of apply_rotary_pos_emb

forward(
hidden_states: torch.Tensor,
attention_mask: torch.Tensor,
key_value_states: torch.Tensor | None = None,
inference_context: megatron.core.transformer.attention.BaseInferenceContext | None = None,
rotary_pos_emb: torch.Tensor | tuple[torch.Tensor, torch.Tensor] | None = None,
rotary_pos_cos: torch.Tensor | None = None,
rotary_pos_sin: torch.Tensor | None = None,
attention_bias: torch.Tensor | None = None,
packed_seq_params: megatron.core.transformer.attention.PackedSeqParams | None = None,
sequence_len_offset: int | None = None,
*,
inference_params: megatron.core.transformer.attention.BaseInferenceContext | None = None,
rotary_pos_cos_sin: torch.Tensor | None = None,
) tuple[torch.Tensor, torch.Tensor]#

Perform a forward pass through the attention module.

Parameters:
  • hidden_states (Tensor) – Hidden states.

  • attention_mask (Tensor) – Attention mask.

  • key_value_states (Optional[Tensor]) – Key/value states (for cross attention).

  • inference_context (Optional[BaseInferenceContext]) – Inference context that manages KV cache.

  • rotary_pos_emb (Optional[Union[Tensor, Tuple[Tensor, Tensor]]]) – Rotary embedding tensor(s).

  • rotary_pos_cos (Optional[Tensor]) – Rotary embedding cosine.

  • rotary_pos_sin (Optional[Tensor]) – Rotary embedding sine.

  • attention_bias (Optional[Tensor]) – Attention bias.

  • packed_seq_params (Optional[PackedSeqparams]) – Parameters used for THD format.

  • sequence_len_offset (Optional[int]) – Sequence length offset used for inference CUDA graphs.

Returns:

(Tuple[Tensor, Tensor]) Attention output and bias.