bridge.models.gpt_oss.gpt_oss_bridge#

Module Contents#

Classes#

GPTOSSBridge

Megatron Hub Bridge for GPT-OSS models.

GPTOSSMLPDownProjMapping

MLPDownProj for expert weights in GPT-OSS models.

GPTOSSMLPGateUpProjMapping

MLPGateUpProj for expert weights in GPT-OSS models.

Functions#

API#

class bridge.models.gpt_oss.gpt_oss_bridge.GPTOSSBridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Hub Bridge for GPT-OSS models.

As a user you would not use this bridge directly, but through AutoBridge.

.. rubric:: Example

from megatron.bridge import AutoBridge bridge = AutoBridge.from_hf_pretrained(“openai/gpt-oss-model”) provider = bridge.to_megatron_provider()

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
) megatron.bridge.models.gpt_provider.GPTModelProvider#

Convert HuggingFace config to GPTModelProvider.

maybe_modify_loaded_hf_weight(
hf_param: str | dict[str, str],
hf_state_dict: Mapping[str, torch.Tensor],
) torch.Tensor#

Load weights from HuggingFace state dict with MXFP4 dequantization support.

down_proj is handled in GPTOSSMLPDownProjMapping.

gate_up_proj is handled directly in GPTOSSMLPGateUpProjMapping.hf_to_megatron via _align_expert_weight_to_shape, which auto-detects the orientation difference between BF16 checkpoints ([num_experts, hidden, 2intermediate]) and MXFP4-dequantized checkpoints ([num_experts, 2intermediate, hidden]).

mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

Return MegatronMappingRegistry containing parameter mappings from HF to Megatron format. Based on the GPT-OSS importer code provided.

class bridge.models.gpt_oss.gpt_oss_bridge.GPTOSSMLPDownProjMapping(
megatron_param: str,
hf_param: str,
permute_dims: Optional[Tuple[int, ...]] = None,
)#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

MLPDownProj for expert weights in GPT-OSS models.

Initialization

is_grouped_export#

True

property group_key: str#
hf_to_megatron(
hf_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) torch.Tensor#
megatron_to_hf(
megatron_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) Dict[str, torch.Tensor]#
class bridge.models.gpt_oss.gpt_oss_bridge.GPTOSSMLPGateUpProjMapping(
megatron_param: str,
hf_param: str,
permute_dims: Optional[Tuple[int, ...]] = None,
)#

Bases: megatron.bridge.models.conversion.param_mapping.AutoMapping

MLPGateUpProj for expert weights in GPT-OSS models.

GPT-OSS uses alternating row interleaving for gate/up projections.

Initialization

is_grouped_export#

True

property group_key: str#
static _interleave(gate_up_proj)#
_uninterleave(elem)#
hf_to_megatron(
hf_weights: Union[torch.Tensor, Dict],
megatron_module: torch.nn.Module,
) torch.Tensor#
megatron_to_hf(
megatron_weights: torch.Tensor,
megatron_module: torch.nn.Module,
) Dict[str, torch.Tensor]#
bridge.models.gpt_oss.gpt_oss_bridge._dequantize_mxfp4(
blocks: torch.Tensor,
scales: torch.Tensor,
*,
dtype: torch.dtype = torch.bfloat16,
rows_per_chunk: int = 32768 * 1024,
) torch.Tensor#