bridge.models.qwen_vl.qwen3_vl_step#
Module Contents#
Functions#
Get a batch of data from the iterator. |
|
Generate a batch. |
|
Pad or truncate the batch sequences to the target length, and build packed sequences. If is_qwen3vl, return bshd tokens for be compatible with qwen3vl model. Otherwise, return thd tokens and packed sequences. |
|
Forward training step. |
Data#
API#
- bridge.models.qwen_vl.qwen3_vl_step.logger#
βgetLogger(β¦)β
- bridge.models.qwen_vl.qwen3_vl_step.get_batch_from_iterator(
- data_iterator: Iterable,
- use_mtp: bool = False,
- skip_getting_attention_mask_from_dataset: bool = True,
- *,
- is_first_pp_stage: bool,
- is_last_pp_stage: bool,
Get a batch of data from the iterator.
- Parameters:
data_iterator β The data iterator to get the batch from.
use_mtp β Whether Multi-Token Prediction layers are enabled.
skip_getting_attention_mask_from_dataset β If set, the dataset will pass a None attention mask.
- Returns:
A dictionary containing the batch data.
- Return type:
dict[str, torch.Tensor]
- bridge.models.qwen_vl.qwen3_vl_step.get_batch(
- data_iterator: Iterable,
- cfg: megatron.bridge.training.config.ConfigContainer,
- use_mtp: bool = False,
- *,
- is_first_pp_stage: bool,
- is_last_pp_stage: bool,
Generate a batch.
- Parameters:
data_iterator β Input data iterator
cfg β Configuration container
use_mtp β Whether Multi-Token Prediction layers are enabled
is_first_pp_stage β Whether the current stage is the first stage
is_last_pp_stage β Whether the current stage is the last stage
- Returns:
add description
- Return type:
TODO
- bridge.models.qwen_vl.qwen3_vl_step.pack_or_pad_batch_sequences(
- tokens: torch.Tensor,
- labels: torch.Tensor,
- loss_mask: torch.Tensor,
- attention_mask: torch.Tensor,
- position_ids: torch.Tensor,
- this_pg_collection,
- use_fp8_padding: bool = False,
- force_to_pad_to_seq_len: bool = False,
- seq_length: int = None,
Pad or truncate the batch sequences to the target length, and build packed sequences. If is_qwen3vl, return bshd tokens for be compatible with qwen3vl model. Otherwise, return thd tokens and packed sequences.
- bridge.models.qwen_vl.qwen3_vl_step.forward_step(
- state: megatron.bridge.training.state.GlobalState,
- data_iterator: Iterable,
- model: megatron.core.models.gpt.GPTModel,
- return_schedule_plan: bool = False,
Forward training step.
- Parameters:
state β Global state for the run
data_iterator β Input data iterator
model β The GPT Model
return_schedule_plan (bool) β Whether to return the schedule plan instead of the output tensor
- Returns:
tuple containing the output tensor and the loss function