bridge.models.qwen_vl.modelling_qwen3_vl.utils#

Module Contents#

Functions#

split_deepstack_embs

Split deepstack visual embeddings for tensor and context parallelism.

get_rope_index

Different from the original implementation, Qwen3VL use timestamps rather than absolute time position ids.

API#

bridge.models.qwen_vl.modelling_qwen3_vl.utils.split_deepstack_embs(
visual_pos_masks: torch.Tensor,
deepstack_visual_embeds: list[torch.Tensor],
tp_size: int = 1,
tp_rank: int = 0,
cp_size: int = 1,
cp_rank: int = 0,
)#

Split deepstack visual embeddings for tensor and context parallelism.

Parameters:
  • visual_pos_masks – Visual position masks tensor

  • deepstack_visual_embeds – List of deepstack visual embeddings

  • tp_size – Tensor parallel size (default: 1)

  • tp_rank – Tensor parallel rank (default: 0)

  • cp_size – Context parallel size (default: 1)

  • cp_rank – Context parallel rank (default: 0)

Returns:

Split visual embeddings based on parallelism configuration

bridge.models.qwen_vl.modelling_qwen3_vl.utils.get_rope_index(
spatial_merge_size: int,
image_token_id: int,
video_token_id: int,
vision_start_token_id: int,
input_ids: Optional[torch.LongTensor] = None,
image_grid_thw: Optional[torch.LongTensor] = None,
video_grid_thw: Optional[torch.LongTensor] = None,
attention_mask: Optional[torch.Tensor] = None,
) tuple[torch.Tensor, torch.Tensor]#

Different from the original implementation, Qwen3VL use timestamps rather than absolute time position ids.