bridge.models.qwen_audio.qwen2_audio_provider#

Qwen2-Audio Model Provider configurations for Megatron-Core.

This module provides configuration classes for Qwen2-Audio models, compatible with HuggingFace’s Qwen2-Audio model configurations.

Reference: https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct

Qwen2-Audio Key Features:

  • Audio-language capabilities with separate language model and audio encoder

  • Whisper-like audio encoder for processing mel spectrograms

  • Based on Qwen2 language model architecture

Module Contents#

Classes#

Qwen2AudioModelProvider

Base model provider for Qwen2-Audio Models.

API#

class bridge.models.qwen_audio.qwen2_audio_provider.Qwen2AudioModelProvider#

Bases: megatron.bridge.models.gpt_provider.GPTModelProvider

Base model provider for Qwen2-Audio Models.

Qwen2-Audio is a multimodal model combining a Whisper-like audio encoder with a Qwen2 language model for audio understanding tasks.

Reference:

  • https://huggingface.co/Qwen/Qwen2-Audio-7B

  • https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct

Key Features:

  • Audio encoder based on Whisper architecture

  • Supports variable-length audio inputs via mel spectrograms

  • Multi-turn conversation with audio context

scatter_embedding_sequence_parallel: bool#

False

hf_config: Optional[Any]#

None

audio_token_id: int#

151646

bos_token_id: int#

151643

eos_token_id: int#

151645

pad_token_id: int#

151643

freeze_language_model: bool#

False

freeze_audio_model: bool#

False

freeze_audio_projection: bool#

False

provide(
pre_process=None,
post_process=None,
vp_stage=None,
) megatron.bridge.models.qwen_audio.modeling_qwen2_audio.Qwen2AudioModel#

Provide a Qwen2AudioModel instance with audio and language components.

Parameters:
  • pre_process – Whether this is the first stage in pipeline parallelism

  • post_process – Whether this is the last stage in pipeline parallelism

  • vp_stage – Virtual pipeline stage number

Returns:

Qwen2AudioModel instance with HF audio encoder and Megatron language model

provide_language_model(
pre_process=None,
post_process=None,
vp_stage=None,
) megatron.core.models.gpt.GPTModel#

Provide just the language model component without audio.

Parameters:
  • pre_process – Whether this is the first stage in pipeline parallelism

  • post_process – Whether this is the last stage in pipeline parallelism

  • vp_stage – Virtual pipeline stage number

Returns:

MCoreGPTModel instance (language model only)