`bridge.models.qwen_vl.qwen3_vl_step`#

Module Contents#

Functions#

`get_batch_from_iterator`	Get a batch of data from the iterator.
`get_batch`	Generate a batch.
`_pad_and_pack_qwen3_vl_step`	Prepare Qwen3-VL step-owned sequence tensors and packed metadata.
`forward_step`	Forward training step.

Data#

logger

API#

bridge.models.qwen_vl.qwen3_vl_step.logger#: ‘getLogger(…)’

bridge.models.qwen_vl.qwen3_vl_step.get_batch_from_iterator( data_iterator: Iterable, use_mtp: bool = False, skip_getting_attention_mask_from_dataset: bool = True, *, is_first_pp_stage: bool, is_last_pp_stage: bool, ) → dict[str, Any]#

Get a batch of data from the iterator.

Parameters:

data_iterator – The data iterator to get the batch from.
use_mtp – Whether Multi-Token Prediction layers are enabled.
skip_getting_attention_mask_from_dataset – If set, the dataset will pass a None attention mask.

Returns:

A dictionary containing the batch data.

Return type:

dict[str, torch.Tensor]

bridge.models.qwen_vl.qwen3_vl_step.get_batch( data_iterator: Iterable, cfg: megatron.bridge.training.config.ConfigContainer, use_mtp: bool = False, *, is_first_pp_stage: bool, is_last_pp_stage: bool, ) → tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, Any]#

Generate a batch.

Parameters:

data_iterator – Input data iterator
cfg – Configuration container
use_mtp – Whether Multi-Token Prediction layers are enabled
is_first_pp_stage – Whether the current stage is the first stage
is_last_pp_stage – Whether the current stage is the last stage

Returns:

add description

Return type:

TODO

bridge.models.qwen_vl.qwen3_vl_step._pad_and_pack_qwen3_vl_step( tokens: torch.Tensor, labels: torch.Tensor, loss_mask: torch.Tensor, attention_mask: torch.Tensor, position_ids: torch.Tensor, this_pg_collection, use_fp8_padding: bool = False, force_to_pad_to_seq_len: bool = False, seq_length: int = None, ) → tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, megatron.core.packed_seq_params.PackedSeqParams]#

Prepare Qwen3-VL step-owned sequence tensors and packed metadata.

Qwen3-VL keeps tokens in [B, S] form for model-specific CP/SP handling, while still building PackedSeqParams for attention boundaries. This is an internal compatibility path for Qwen3-VL; new models should prefer collate-time packing via prepare_sequence_batch.

bridge.models.qwen_vl.qwen3_vl_step.forward_step( state: megatron.bridge.training.state.GlobalState, data_iterator: Iterable, model: megatron.core.models.gpt.GPTModel, return_schedule_plan: bool = False, ) → tuple[torch.Tensor, functools.partial]#

Forward training step.

Parameters:

state – Global state for the run
data_iterator – Input data iterator
model – The GPT Model
return_schedule_plan (bool) – Whether to return the schedule plan instead of the output tensor

Returns:

tuple containing the output tensor and the loss function

bridge.models.qwen_vl.qwen3_vl_step#

Module Contents#

Functions#

Data#

API#

`bridge.models.qwen_vl.qwen3_vl_step`#