bridge.models.qwen3_asr.qwen3_asr_provider#
Qwen3-ASR Model Provider configurations for Megatron-Core.
This module provides configuration classes for Qwen3-ASR audio speech recognition models (audio+text), compatible with HuggingFace’s Qwen3-ASR model configurations.
Module Contents#
Classes#
Base model provider for Qwen3-ASR Models. Inherits language model configuration from GPTModelProvider with Qwen3-specific defaults. |
API#
- class bridge.models.qwen3_asr.qwen3_asr_provider.Qwen3ASRModelProvider#
Bases:
megatron.bridge.models.gpt_provider.GPTModelProviderBase model provider for Qwen3-ASR Models. Inherits language model configuration from GPTModelProvider with Qwen3-specific defaults.
Key characteristics:
Audio-only (no vision, no video)
Qwen3-based LLM: qk_layernorm=True, no QKV bias, SwiGLU activation
mrope_section: [24, 20, 20]
rotary_base: 5000000.0
Simple RoPE: same position IDs across all 3 MRoPE dims
- thinker_config: megatron.bridge.models.qwen3_asr.hf_qwen3_asr.configuration_qwen3_asr.Qwen3ASRThinkerConfig#
‘field(…)’
- audio_token_id: int#
151646
- audio_start_token_id: int#
151647
- activation_func: Callable#
None
- gated_linear_unit: bool#
True
- add_qkv_bias: bool#
False
- add_bias_linear: bool#
False
- qk_layernorm: bool#
True
0.0
- attention_softmax_in_fp32: bool#
True
- attention_dropout: float#
0.0
- position_embedding_type: str#
‘mrope’
- apply_rotary_pos_emb_in_fp32: bool#
False
- mrope_section: list[int]#
‘field(…)’
- rotary_base: float#
5000000.0
- scatter_embedding_sequence_parallel: bool#
False
- freeze_language_model: bool#
False
- freeze_audio_model: bool#
False
- language_max_sequence_length: int#
2048
- normalization: str#
‘RMSNorm’
- persist_layer_norm: bool#
True
- bias_activation_fusion: bool#
True
- bias_dropout_fusion: bool#
True
- masked_softmax_fusion: bool#
False
- deallocate_pipeline_outputs: bool#
True
- async_tensor_model_parallel_allreduce: bool#
True
- distribute_saved_activations: bool#
False
- cp_comm_type: str#
‘p2p’
- gradient_accumulation_fusion: bool#
False
- provide(pre_process=None, post_process=None, vp_stage=None)#
Provide a Qwen3-ASR model instance with audio and language components.
- provide_language_model(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Provide just the language model component without audio.