bridge.recipes.qwen_vl.qwen25_vl_dataset#

Module Contents#

Classes#

MockQwen25VLDataset

Mock vision-language dataset for Qwen2.5-VL that yields text+image samples.

MockQwen25VLDatasetProvider

DatasetProvider for a mock Qwen2.5-VL vision-language dataset.

API#

class bridge.recipes.qwen_vl.qwen25_vl_dataset.MockQwen25VLDataset(size: int, config: Any)#

Bases: torch.utils.data.Dataset

Mock vision-language dataset for Qwen2.5-VL that yields text+image samples.

Each sample contains:

  • tokens: torch.LongTensor [L]

  • labels: torch.LongTensor [L]

  • attention_mask: torch.BoolTensor [L] (all ones by default)

  • loss_mask: torch.FloatTensor [L]

  • position_ids: torch.LongTensor [L]

  • pixel_values: torch.FloatTensor [num_images, C, H, W]

  • image_grid_thw: torch.LongTensor [num_images, 3]

Initialization

__len__() int#
_generate_random_image() PIL.Image.Image#
_build_inputs() Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]]#
__getitem__(idx: int) Dict[str, torch.Tensor]#
class bridge.recipes.qwen_vl.qwen25_vl_dataset.MockQwen25VLDatasetProvider#

Bases: megatron.bridge.training.config.DatasetProvider

DatasetProvider for a mock Qwen2.5-VL vision-language dataset.

Builds train/valid/test datasets using a HF AutoProcessor and the MockQwen25VLDataset implementation.

sequence_length: int#

None

hf_model_path: str#

‘Qwen/Qwen2.5-VL-3B-Instruct’

prompt: str#

‘Describe this image.’

random_seed: int#

0

image_size: Tuple[int, int]#

(256, 256)

pad_to_max_length: bool#

True

create_attention_mask: bool#

True

skip_getting_attention_mask_from_dataset: bool#

True

num_images: int#

1

_processor: Optional[Any]#

None

build_datasets(
context: megatron.bridge.training.config.DatasetBuildContext,
)#

Create mock Qwen2.5-VL datasets for train/valid/test splits.

Parameters:

context – Provides sample counts and optional tokenizer.

Returns:

Tuple[Optional[Dataset], Optional[Dataset], Optional[Dataset]]