bridge.models.gpt_oss.gpt_oss_bridge#
Module Contents#
Classes#
Megatron Hub Bridge for GPT-OSS models. |
|
MLPDownProj for expert weights in GPT-OSS models. |
|
MLPGateUpProj for expert weights in GPT-OSS models. |
API#
- class bridge.models.gpt_oss.gpt_oss_bridge.GPTOSSBridge#
Bases:
megatron.bridge.models.conversion.model_bridge.MegatronModelBridgeMegatron Hub Bridge for GPT-OSS models.
As a user you would not use this bridge directly, but through
AutoBridge... rubric:: Example
from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“openai/gpt-oss-model”) provider = bridge.to_megatron_provider()
- provider_bridge(
- hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
Convert HuggingFace config to GPTModelProvider.
- maybe_modify_loaded_hf_weight(
- hf_param: str | dict[str, str],
- hf_state_dict: Mapping[str, torch.Tensor],
Load weights from HuggingFace state dict with MXFP4 dequantization support.
Per-expert
down_projis square for GPT-OSS-20B/120B (hidden == intermediate), so the bridge cannot auto-detect orientation from shape alone. BF16 checkpoints (e.g.unsloth/gpt-oss-20b-BF16, and whattransformers.GptOssForCausalLMproduces at init) store it as[E, intermediate, hidden], matchinggate_up_proj’s[E, hidden, 2*intermediate]convention. MXFP4-dequantized weights come out as[E, hidden, intermediate]. Megatron’s TERowParallelGroupedLinearexpects per-expert(hidden, intermediate), so the BF16 path needs a transpose here while the MXFP4 path is already aligned. Without this, BF16 imports silently store down_proj in the wrong orientation and inference is broken.gate_up_proj is handled directly in GPTOSSMLPGateUpProjMapping.hf_to_megatron via _align_expert_weight_to_shape, which auto-detects the orientation difference between BF16 checkpoints ([num_experts, hidden, 2intermediate]) and MXFP4-dequantized checkpoints ([num_experts, 2intermediate, hidden]).
- mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
Return MegatronMappingRegistry containing parameter mappings from HF to Megatron format. Based on the GPT-OSS importer code provided.
- class bridge.models.gpt_oss.gpt_oss_bridge.GPTOSSMLPDownProjMapping(
- megatron_param: str,
- hf_param: str,
- permute_dims: Optional[Tuple[int, ...]] = None,
Bases:
megatron.bridge.models.conversion.param_mapping.AutoMappingMLPDownProj for expert weights in GPT-OSS models.
Initialization
- is_grouped_export#
True
- property group_key: str#
- hf_to_megatron(
- hf_weights: torch.Tensor,
- megatron_module: torch.nn.Module,
- megatron_to_hf(
- megatron_weights: torch.Tensor,
- megatron_module: torch.nn.Module,
- class bridge.models.gpt_oss.gpt_oss_bridge.GPTOSSMLPGateUpProjMapping(
- megatron_param: str,
- hf_param: str,
- permute_dims: Optional[Tuple[int, ...]] = None,
Bases:
megatron.bridge.models.conversion.param_mapping.AutoMappingMLPGateUpProj for expert weights in GPT-OSS models.
GPT-OSS uses alternating row interleaving for gate/up projections.
Initialization
- is_grouped_export#
True
- property group_key: str#
- static _interleave(gate_up_proj)#
- _uninterleave(elem)#
- hf_to_megatron(
- hf_weights: Union[torch.Tensor, Dict],
- megatron_module: torch.nn.Module,
- megatron_to_hf(
- megatron_weights: torch.Tensor,
- megatron_module: torch.nn.Module,