bridge.models.qwen_omni.modeling_qwen3_omni.transformer_config#
Module Contents#
Classes#
Qwen3-Omni transformer config. |
API#
- class bridge.models.qwen_omni.modeling_qwen3_omni.transformer_config.Qwen3OmniTransformerConfig#
Bases:
megatron.bridge.models.qwen_vl.modelling_qwen3_vl.transformer_config.Qwen3VLTransformerConfigQwen3-Omni transformer config.
This config extends the Qwen3-VL language/vision path with Qwen3-Omni multimodal token ids and audio-related settings.
- vocab_size: int#
152064
- language_max_sequence_length: int#
32768
- patch_size: int#
16
- temporal_patch_size: int#
2
- spatial_merge_size: int#
2
- fp16_lm_cross_entropy: bool#
False
- rotary_percent: float#
1.0
- rotary_base: float#
1000000.0
- mrope_section: list[int]#
‘field(…)’
- image_token_id: int#
151655
- video_token_id: int#
151656
- audio_token_id: int#
151646
- vision_start_token_id: int#
151652
- vision_end_token_id: int#
151653
- audio_start_token_id: int#
151647
- audio_end_token_id: int#
151648
- position_id_per_seconds: int#
25
- seconds_per_chunk: int#
2
- qk_layernorm: bool#
True