bridge.models.ministral3.ministral3_bridge#
Megatron Bridge for Ministral 3 Vision-Language Models.
This module provides the bridge implementation for converting between HuggingFace Ministral-3 models and Megatron-Core format.
Supported models:
Ministral-3-3B-Base-2512
Ministral-3-3B-Instruct-2512
Ministral-3-3B-Reasoning-2512
Ministral-3-8B-Base-2512
Ministral-3-8B-Instruct-2512
Ministral-3-8B-Reasoning-2512
Ministral-3-14B-Base-2512
Ministral-3-14B-Instruct-2512
Ministral-3-14B-Reasoning-2512
Reference: https://huggingface.co/mistralai/Ministral-3-3B-Base-2512
Module Contents#
Classes#
Megatron Bridge for Ministral 3 Vision-Language Models. |
API#
- class bridge.models.ministral3.ministral3_bridge.Ministral3Bridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Bridge for Ministral 3 Vision-Language Models.
This bridge handles conversion between HuggingFace Mistral3ForConditionalGeneration and Megatron-Core Ministral3Model format for vision-language models.
The weight mappings handle:
Vision model weights (vision encoder)
Language model weights
Multimodal projector weights
Special token embeddings
.. rubric:: Example
from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“mistralai/Ministral-3-3B-Base-2512”) provider = bridge.to_megatron_provider()
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.vlm.PreTrainedVLM,
Create a Ministral3ModelProvider from a HuggingFace pretrained VL model.
- Parameters:
hf_pretrained – HuggingFace pretrained VLM model
- Returns:
Ministral3ModelProvider configured with the HF model’s parameters
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
Return MegatronMappingRegistry containing parameter mappings for VL models.
HuggingFace weight structure:
language_model.model.embed_tokens.weight
language_model.model.layers.{i}.input_layernorm.weight
language_model.model.layers.{i}.self_attn.{q,k,v,o}_proj.weight
language_model.model.layers.{i}.post_attention_layernorm.weight
language_model.model.layers.{i}.mlp.{gate,up,down}_proj.weight
language_model.model.norm.weight
language_model.lm_head.weight
vision_tower.** (patch_conv, ln_pre, transformer layers)
multi_modal_projector.{norm,linear}.weight
- Returns:
MegatronMappingRegistry with all parameter mappings
- maybe_modify_loaded_hf_weight(
- hf_param: Union[str, dict[str, str]],
- hf_state_dict: Mapping[str, torch.Tensor],
Load HF weights and dequantize FP8 tensors on the fly.
Ministral-3-*-Instruct-2512 stores LM weights in FP8 (float8_e4m3fn) with separate
weight_scale_invscalar tensors. The true bfloat16 weight is::w_bf16 = fp8_weight.to(bfloat16) * weight_scale_inv
This override applies dequantization transparently so that the bridge produces correct Megatron checkpoints without a separate preprocessing step.
- static _maybe_dequantize_fp8(
- weight: torch.Tensor,
- param_name: str,
- hf_state_dict: Mapping[str, torch.Tensor],
Dequantize weight if it is stored as FP8.
Looks up
param_name + "_scale_inv"in hf_state_dict and applies::w_bf16 = weight.to(bfloat16) * scale_inv