bridge.models.qwen_omni.qwen3_omni_provider#

Module Contents#

Classes#

Qwen3OmniModelProvider

Provider for Qwen3-Omni.

API#

class bridge.models.qwen_omni.qwen3_omni_provider.Qwen3OmniModelProvider#

Bases: megatron.bridge.models.gpt_provider.GPTModelProvider

Provider for Qwen3-Omni.

The current implementation focuses on thinker-side multimodal training and checkpoint conversion paths.

thinker_config: transformers.models.qwen3_omni_moe.configuration_qwen3_omni_moe.Qwen3OmniMoeThinkerConfig#

‘field(…)’

talker_config: transformers.models.qwen3_omni_moe.configuration_qwen3_omni_moe.Qwen3OmniMoeTalkerConfig | None#

None

code2wav_config: transformers.models.qwen3_omni_moe.configuration_qwen3_omni_moe.Qwen3OmniMoeCode2WavConfig | None#

None

pretrained_model_name: str#

‘Qwen/Qwen3-Omni-30B-A3B-Instruct’

normalization: str#

‘RMSNorm’

activation_func: Callable#

None

gated_linear_unit: bool#

True

add_bias_linear: bool#

False

add_qkv_bias: bool#

False

hidden_dropout: float#

0.0

qk_layernorm: bool#

True

moe_grouped_gemm: bool#

True

moe_router_load_balancing_type: str#

‘aux_loss’

moe_aux_loss_coeff: float#

0.001

moe_router_pre_softmax: bool#

False

moe_token_dispatcher_type: str#

‘alltoall’

moe_permute_fusion: bool#

True

image_token_id: int#

151655

video_token_id: int#

151656

audio_token_id: int#

151646

vision_start_token_id: int#

151652

vision_end_token_id: int#

151653

audio_start_token_id: int#

151647

audio_end_token_id: int#

151648

bos_token_id: int#

151643

eos_token_id: int#

151645

language_max_sequence_length: int#

32768

position_embedding_type: str#

‘mrope’

position_id_per_seconds: int#

25

seconds_per_chunk: int#

2

patch_size: int#

16

temporal_patch_size: int#

2

spatial_merge_size: int#

2

mrope_section: list[int]#

‘field(…)’

scatter_embedding_sequence_parallel: bool#

False

freeze_language_model: bool#

False

freeze_vision_model: bool#

False

freeze_audio_model: bool#

False

vit_gradient_checkpointing: bool#

False

multimodal_attn_impl: str#

‘auto’

provide(pre_process=None, post_process=None, vp_stage=None)#
provide_language_model(
pre_process=None,
post_process=None,
vp_stage=None,
) megatron.core.models.gpt.GPTModel#