bridge.models.qwen_vl.qwen3_vl_step#

Module Contents#

Functions#

get_batch_from_iterator

Get a batch of data from the iterator.

get_batch

Generate a batch.

pack_or_pad_batch_sequences

Pad or truncate the batch sequences to the target length, and build packed sequences. If is_qwen3vl, return bshd tokens for be compatible with qwen3vl model. Otherwise, return thd tokens and packed sequences.

forward_step

Forward training step.

Data#

API#

bridge.models.qwen_vl.qwen3_vl_step.logger#

β€˜getLogger(…)’

bridge.models.qwen_vl.qwen3_vl_step.get_batch_from_iterator(
data_iterator: Iterable,
use_mtp: bool = False,
skip_getting_attention_mask_from_dataset: bool = True,
*,
is_first_pp_stage: bool,
is_last_pp_stage: bool,
) dict[str, Any]#

Get a batch of data from the iterator.

Parameters:
  • data_iterator – The data iterator to get the batch from.

  • use_mtp – Whether Multi-Token Prediction layers are enabled.

  • skip_getting_attention_mask_from_dataset – If set, the dataset will pass a None attention mask.

Returns:

A dictionary containing the batch data.

Return type:

dict[str, torch.Tensor]

bridge.models.qwen_vl.qwen3_vl_step.get_batch(
data_iterator: Iterable,
cfg: megatron.bridge.training.config.ConfigContainer,
use_mtp: bool = False,
*,
is_first_pp_stage: bool,
is_last_pp_stage: bool,
) tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, Any]#

Generate a batch.

Parameters:
  • data_iterator – Input data iterator

  • cfg – Configuration container

  • use_mtp – Whether Multi-Token Prediction layers are enabled

  • is_first_pp_stage – Whether the current stage is the first stage

  • is_last_pp_stage – Whether the current stage is the last stage

Returns:

add description

Return type:

TODO

bridge.models.qwen_vl.qwen3_vl_step.pack_or_pad_batch_sequences(
tokens: torch.Tensor,
labels: torch.Tensor,
loss_mask: torch.Tensor,
attention_mask: torch.Tensor,
position_ids: torch.Tensor,
this_pg_collection,
use_fp8_padding: bool = False,
force_to_pad_to_seq_len: bool = False,
seq_length: int = None,
) tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, megatron.core.packed_seq_params.PackedSeqParams]#

Pad or truncate the batch sequences to the target length, and build packed sequences. If is_qwen3vl, return bshd tokens for be compatible with qwen3vl model. Otherwise, return thd tokens and packed sequences.

bridge.models.qwen_vl.qwen3_vl_step.forward_step(
state: megatron.bridge.training.state.GlobalState,
data_iterator: Iterable,
model: megatron.core.models.gpt.GPTModel,
return_schedule_plan: bool = False,
) tuple[torch.Tensor, functools.partial]#

Forward training step.

Parameters:
  • state – Global state for the run

  • data_iterator – Input data iterator

  • model – The GPT Model

  • return_schedule_plan (bool) – Whether to return the schedule plan instead of the output tensor

Returns:

tuple containing the output tensor and the loss function