bridge.models.qwen_vl.modelling_qwen3_vl.attention#
Module Contents#
Classes#
Overrides the SelfAttention class, the difference is that qwen3vl uses apply_rotary_pos_emb_absolute instead of apply_rotary_pos_emb |
API#
- class bridge.models.qwen_vl.modelling_qwen3_vl.attention.Qwen3VLSelfAttention#
Bases:
megatron.core.transformer.attention.SelfAttentionOverrides the SelfAttention class, the difference is that qwen3vl uses apply_rotary_pos_emb_absolute instead of apply_rotary_pos_emb
- forward(
- hidden_states: torch.Tensor,
- attention_mask: torch.Tensor,
- key_value_states: torch.Tensor | None = None,
- inference_context: megatron.core.transformer.attention.BaseInferenceContext | None = None,
- rotary_pos_emb: torch.Tensor | tuple[torch.Tensor, torch.Tensor] | None = None,
- rotary_pos_cos: torch.Tensor | None = None,
- rotary_pos_sin: torch.Tensor | None = None,
- attention_bias: torch.Tensor | None = None,
- packed_seq_params: megatron.core.transformer.attention.PackedSeqParams | None = None,
- sequence_len_offset: int | None = None,
- *,
- inference_params: megatron.core.transformer.attention.BaseInferenceContext | None = None,
- rotary_pos_cos_sin: torch.Tensor | None = None,
Perform a forward pass through the attention module.
- Parameters:
hidden_states (Tensor) – Hidden states.
attention_mask (Tensor) – Attention mask.
key_value_states (Optional[Tensor]) – Key/value states (for cross attention).
inference_context (Optional[BaseInferenceContext]) – Inference context that manages KV cache.
rotary_pos_emb (Optional[Union[Tensor, Tuple[Tensor, Tensor]]]) – Rotary embedding tensor(s).
rotary_pos_cos (Optional[Tensor]) – Rotary embedding cosine.
rotary_pos_sin (Optional[Tensor]) – Rotary embedding sine.
attention_bias (Optional[Tensor]) – Attention bias.
packed_seq_params (Optional[PackedSeqparams]) – Parameters used for THD format.
sequence_len_offset (Optional[int]) – Sequence length offset used for inference CUDA graphs.
- Returns:
(Tuple[Tensor, Tensor]) Attention output and bias.