bridge.models.qwen_audio.qwen2_audio_bridge#

Megatron Bridge for Qwen2-Audio Models.

This module provides the bridge implementation for converting between HuggingFace Qwen2-Audio models and Megatron-Core format.

Supported models:

  • Qwen2-Audio-7B

  • Qwen2-Audio-7B-Instruct

Reference: https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct

Module Contents#

Classes#

Qwen2AudioBridge

Megatron Bridge for Qwen2-Audio Models.

API#

class bridge.models.qwen_audio.qwen2_audio_bridge.Qwen2AudioBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for Qwen2-Audio Models.

This bridge handles conversion between HuggingFace Qwen2AudioForConditionalGeneration and Megatron-Core Qwen2AudioModel format for audio-language models.

The weight mappings handle:

  • Audio encoder weights (audio_tower)

  • Language model weights

  • Multimodal projector weights

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“Qwen/Qwen2-Audio-7B-Instruct”) provider = bridge.to_megatron_provider()

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
) megatron.bridge.models.qwen_audio.qwen2_audio_provider.Qwen2AudioModelProvider#

Create a Qwen2AudioModelProvider from a HuggingFace pretrained model.

Parameters:

hf_pretrained – HuggingFace pretrained model

Returns:

Qwen2AudioModelProvider configured with the HF model’s parameters

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings for audio-language models.

HuggingFace weight structure:

  • language_model.model.embed_tokens.weight

  • language_model.model.layers.{i}.input_layernorm.weight

  • language_model.model.layers.{i}.self_attn.{q,k,v,o}_proj.weight

  • language_model.model.layers.{i}.post_attention_layernorm.weight

  • language_model.model.layers.{i}.mlp.{gate,up,down}_proj.weight

  • language_model.model.norm.weight

  • language_model.lm_head.weight

  • audio_tower.** (conv1, conv2, embed_positions, layers, layer_norm, avg_pooler)

  • multi_modal_projector.linear.weight

Returns:

MegatronMappingRegistry with all parameter mappings