bridge.models.qwen_vl.modelling_qwen3_vl.utils#
Module Contents#
Functions#
Split deepstack visual embeddings for tensor and context parallelism. |
|
Different from the original implementation, Qwen3VL use timestamps rather than absolute time position ids. |
API#
- bridge.models.qwen_vl.modelling_qwen3_vl.utils.split_deepstack_embs(
- visual_pos_masks: torch.Tensor,
- deepstack_visual_embeds: list[torch.Tensor],
- tp_size: int = 1,
- tp_rank: int = 0,
- cp_size: int = 1,
- cp_rank: int = 0,
Split deepstack visual embeddings for tensor and context parallelism.
- Parameters:
visual_pos_masks – Visual position masks tensor
deepstack_visual_embeds – List of deepstack visual embeddings
tp_size – Tensor parallel size (default: 1)
tp_rank – Tensor parallel rank (default: 0)
cp_size – Context parallel size (default: 1)
cp_rank – Context parallel rank (default: 0)
- Returns:
Split visual embeddings based on parallelism configuration
- bridge.models.qwen_vl.modelling_qwen3_vl.utils.get_rope_index(
- spatial_merge_size: int,
- image_token_id: int,
- video_token_id: int,
- vision_start_token_id: int,
- input_ids: Optional[torch.LongTensor] = None,
- image_grid_thw: Optional[torch.LongTensor] = None,
- video_grid_thw: Optional[torch.LongTensor] = None,
- attention_mask: Optional[torch.Tensor] = None,
Different from the original implementation, Qwen3VL use timestamps rather than absolute time position ids.