bridge.models.qwen_vl.modeling_qwen25_vl
#
Module Contents#
Classes#
Qwen2.5 VL Model. (Based on GPT Transformer language model.) |
Functions#
Check if minimum version of transformers is installed. |
API#
- bridge.models.qwen_vl.modeling_qwen25_vl.is_transformers_min_version(version)#
Check if minimum version of transformers is installed.
- class bridge.models.qwen_vl.modeling_qwen25_vl.Qwen25VLModel(
- config: megatron.bridge.models.gpt_provider.GPTModelProvider,
- pre_process: bool = True,
- post_process: bool = True,
- vp_stage: Optional[int] = None,
Bases:
megatron.core.transformer.module.MegatronModule
Qwen2.5 VL Model. (Based on GPT Transformer language model.)
- Parameters:
config (GPTModelProvider) β language model provider.
transformer_layer_spec (ModuleSpec) β Specifies module to use for transformer layers
vocab_size (int) β Vocabulary size
max_sequence_length (int) β maximum size of sequence. This is used for positional embedding
pre_process (bool, optional) β Include embedding layer (used with pipeline parallelism). Defaults to True.
post_process (bool, optional) β Include an output layer (used with pipeline parallelism). Defaults to True.
fp16_lm_cross_entropy (bool, optional) β Defaults to False.
parallel_output (bool, optional) β Do not gather the outputs, keep them split across tensor parallel ranks. Defaults to True.
share_embeddings_and_output_weights (bool, optional) β When True, input embeddings and output logit weights are shared. Defaults to False.
position_embedding_type (Literal[learned_absolute,rope], optional) β Position embedding type.. Defaults to βlearned_absoluteβ.
rotary_percent (float, optional) β Percent of rotary dimension to use for rotary position embeddings. Ignored unless position_embedding_type is βropeβ. Defaults to 1.0.
rotary_base (int, optional) β Base period for rotary position embeddings. Ignored unless position_embedding_type is βropeβ. Defaults to 10000.
rope_scaling (bool, optional) β Toggle RoPE scaling.
rope_scaling_factor (float) β RoPE scaling factor. Default 8.
scatter_embedding_sequence_parallel (bool, optional) β Whether embeddings should be scattered across sequence parallel region or not. Defaults to True.
seq_len_interpolation_factor (Optional[float], optional) β scale of linearly interpolating RoPE for longer sequences. The value must be a float larger than 1.0. Defaults to None.
pg_collection (ProcessGroupCollection) β Model communication process groups
Initialization
- set_input_tensor(input_tensor) None #
Set model chunk input tensor.
- forward(
- input_ids: torch.LongTensor = None,
- attention_mask: Optional[torch.Tensor] = None,
- position_ids: Optional[torch.LongTensor] = None,
- inputs_embeds: Optional[torch.FloatTensor] = None,
- pixel_values: Optional[torch.Tensor] = None,
- pixel_values_videos: Optional[torch.FloatTensor] = None,
- image_grid_thw: Optional[torch.LongTensor] = None,
- video_grid_thw: Optional[torch.LongTensor] = None,
- second_per_grid_ts: Optional[torch.Tensor] = None,
- labels: torch.Tensor = None,
- inference_context: megatron.core.inference.contexts.BaseInferenceContext = None,
- packed_seq_params: megatron.core.packed_seq_params.PackedSeqParams = None,
- extra_block_kwargs: dict = None,
- runtime_gather_output: Optional[bool] = None,
- *,
- inference_params: Optional[megatron.core.inference.contexts.BaseInferenceContext] = None,
- loss_mask: Optional[torch.Tensor] = None,
image_grid_thw (
torch.LongTensor
of shape(num_images, 3)
, optional): The temporal, height and width of feature shape of each image in LLM. video_grid_thw (torch.LongTensor
of shape(num_videos, 3)
, optional): The temporal, height and width of feature shape of each video in LLM. second_per_grid_ts (torch.Tensor
of shape(num_videos)
, optional): The time interval (in seconds) for each grid along the temporal dimension in the 3D position IDs.
- freeze(
- freeze_language_model: bool,
- freeze_vision_model: bool,
- freeze_vision_projection: bool,
Freeze model modules.
Make specific modules non-trainable by setting requires_grad to False.
- Parameters:
freeze_language_model (bool) β Freeze the language model module.
freeze_vision_model (bool) β Freeze the vision model module (patch_embed and blocks).
freeze_vision_projection (bool) β Freeze the vision projection module (merger).