bridge.models.qwen3_asr.qwen3_asr_provider#

Qwen3-ASR Model Provider configurations for Megatron-Core.

This module provides configuration classes for Qwen3-ASR audio speech recognition models (audio+text), compatible with HuggingFace’s Qwen3-ASR model configurations.

Module Contents#

Classes#

Qwen3ASRModelProvider

Base model provider for Qwen3-ASR Models. Inherits language model configuration from GPTModelProvider with Qwen3-specific defaults.

API#

class bridge.models.qwen3_asr.qwen3_asr_provider.Qwen3ASRModelProvider#

Bases: megatron.bridge.models.gpt_provider.GPTModelProvider

Base model provider for Qwen3-ASR Models. Inherits language model configuration from GPTModelProvider with Qwen3-specific defaults.

Key characteristics:

  • Audio-only (no vision, no video)

  • Qwen3-based LLM: qk_layernorm=True, no QKV bias, SwiGLU activation

  • mrope_section: [24, 20, 20]

  • rotary_base: 5000000.0

  • Simple RoPE: same position IDs across all 3 MRoPE dims

thinker_config: megatron.bridge.models.qwen3_asr.hf_qwen3_asr.configuration_qwen3_asr.Qwen3ASRThinkerConfig#

‘field(…)’

audio_token_id: int#

151646

audio_start_token_id: int#

151647

activation_func: Callable#

None

gated_linear_unit: bool#

True

add_qkv_bias: bool#

False

add_bias_linear: bool#

False

qk_layernorm: bool#

True

hidden_dropout: float#

0.0

attention_softmax_in_fp32: bool#

True

attention_dropout: float#

0.0

position_embedding_type: str#

‘mrope’

apply_rotary_pos_emb_in_fp32: bool#

False

mrope_section: list[int]#

‘field(…)’

rotary_base: float#

5000000.0

scatter_embedding_sequence_parallel: bool#

False

freeze_language_model: bool#

False

freeze_audio_model: bool#

False

language_max_sequence_length: int#

2048

normalization: str#

‘RMSNorm’

persist_layer_norm: bool#

True

bias_activation_fusion: bool#

True

bias_dropout_fusion: bool#

True

masked_softmax_fusion: bool#

False

deallocate_pipeline_outputs: bool#

True

async_tensor_model_parallel_allreduce: bool#

True

distribute_saved_activations: bool#

False

cp_comm_type: str#

‘p2p’

gradient_accumulation_fusion: bool#

False

provide(pre_process=None, post_process=None, vp_stage=None)#

Provide a Qwen3-ASR model instance with audio and language components.

provide_language_model(
pre_process=None,
post_process=None,
vp_stage=None,
) megatron.core.models.gpt.GPTModel#

Provide just the language model component without audio.