bridge.recipes.qwen_vl.qwen25_vl_dataset#
Module Contents#
Classes#
Mock vision-language dataset for Qwen2.5-VL that yields text+image samples. |
|
DatasetProvider for a mock Qwen2.5-VL vision-language dataset. |
API#
- class bridge.recipes.qwen_vl.qwen25_vl_dataset.MockQwen25VLDataset(size: int, config: Any)#
Bases:
torch.utils.data.DatasetMock vision-language dataset for Qwen2.5-VL that yields text+image samples.
Each sample contains:
tokens: torch.LongTensor [L]
labels: torch.LongTensor [L]
attention_mask: torch.BoolTensor [L] (all ones by default)
loss_mask: torch.FloatTensor [L]
position_ids: torch.LongTensor [L]
pixel_values: torch.FloatTensor [num_images, C, H, W]
image_grid_thw: torch.LongTensor [num_images, 3]
Initialization
- __len__() int#
- _generate_random_image() PIL.Image.Image#
- _build_inputs() Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor]]#
- __getitem__(idx: int) Dict[str, torch.Tensor]#
- class bridge.recipes.qwen_vl.qwen25_vl_dataset.MockQwen25VLDatasetProvider#
Bases:
megatron.bridge.training.config.DatasetProviderDatasetProvider for a mock Qwen2.5-VL vision-language dataset.
Builds train/valid/test datasets using a HF AutoProcessor and the MockQwen25VLDataset implementation.
- sequence_length: int#
None
- hf_model_path: str#
‘Qwen/Qwen2.5-VL-3B-Instruct’
- prompt: str#
‘Describe this image.’
- random_seed: int#
0
- image_size: Tuple[int, int]#
(256, 256)
- pad_to_max_length: bool#
True
- create_attention_mask: bool#
True
- skip_getting_attention_mask_from_dataset: bool#
True
- num_images: int#
1
- _processor: Optional[Any]#
None
- build_datasets(
- context: megatron.bridge.training.config.DatasetBuildContext,
Create mock Qwen2.5-VL datasets for train/valid/test splits.
- Parameters:
context – Provides sample counts and optional tokenizer.
- Returns:
Tuple[Optional[Dataset], Optional[Dataset], Optional[Dataset]]