bridge.models.qwen_audio.qwen2_audio_provider#
Qwen2-Audio Model Provider configurations for Megatron-Core.
This module provides configuration classes for Qwen2-Audio models, compatible with HuggingFace’s Qwen2-Audio model configurations.
Reference: https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct
Qwen2-Audio Key Features:
Audio-language capabilities with separate language model and audio encoder
Whisper-like audio encoder for processing mel spectrograms
Based on Qwen2 language model architecture
Module Contents#
Classes#
Base model provider for Qwen2-Audio Models. |
API#
- class bridge.models.qwen_audio.qwen2_audio_provider.Qwen2AudioModelProvider#
Bases:
megatron.bridge.models.gpt_provider.GPTModelProviderBase model provider for Qwen2-Audio Models.
Qwen2-Audio is a multimodal model combining a Whisper-like audio encoder with a Qwen2 language model for audio understanding tasks.
Reference:
https://huggingface.co/Qwen/Qwen2-Audio-7B
https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct
Key Features:
Audio encoder based on Whisper architecture
Supports variable-length audio inputs via mel spectrograms
Multi-turn conversation with audio context
- scatter_embedding_sequence_parallel: bool#
False
- hf_config: Optional[Any]#
None
- audio_token_id: int#
151646
- bos_token_id: int#
151643
- eos_token_id: int#
151645
- pad_token_id: int#
151643
- freeze_language_model: bool#
False
- freeze_audio_model: bool#
False
- freeze_audio_projection: bool#
False
- provide(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Provide a Qwen2AudioModel instance with audio and language components.
- Parameters:
pre_process – Whether this is the first stage in pipeline parallelism
post_process – Whether this is the last stage in pipeline parallelism
vp_stage – Virtual pipeline stage number
- Returns:
Qwen2AudioModel instance with HF audio encoder and Megatron language model
- provide_language_model(
- pre_process=None,
- post_process=None,
- vp_stage=None,
Provide just the language model component without audio.
- Parameters:
pre_process – Whether this is the first stage in pipeline parallelism
post_process – Whether this is the last stage in pipeline parallelism
vp_stage – Virtual pipeline stage number
- Returns:
MCoreGPTModel instance (language model only)