`bridge.models.deepseek.deepseek_v3_bridge`#

Module Contents#

Classes#

DeepSeekV3Bridge

Megatron Bridge for DeepSeek-V3.

Data#

`__all__`
`_dequant_fp8_blockwise`

API#

bridge.models.deepseek.deepseek_v3_bridge.__all__#: [‘DeepSeekV3Bridge’, ‘_dequant_fp8_blockwise’]

bridge.models.deepseek.deepseek_v3_bridge._dequant_fp8_blockwise#: None

class bridge.models.deepseek.deepseek_v3_bridge.DeepSeekV3Bridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for DeepSeek-V3.

provider_bridge( hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM, ) → megatron.bridge.models.mla_provider.MLAModelProvider#

classmethod megatron_to_hf_config( provider: megatron.bridge.models.mla_provider.MLAModelProvider, ) → dict#

mapping_registry() → megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#

maybe_modify_loaded_hf_weight( hf_param: Union[str, dict[str, str]], hf_state_dict: Mapping[str, torch.Tensor], ) → Union[torch.Tensor, dict[str, torch.Tensor]]#

Load HF weights and dequantize FP8 tensors on the fly.

DeepSeek-V3 ships linear weights as float8_e4m3fn with per-block scale factors stored in <key>_scale_inv (128x128 blocks). The true bf16 weight is::

w_bf16 = fp8_weight.float() * scale_inv_block

Without this override the bridge would do a bare .to(bf16) cast in ColumnParallelMapping.hf_to_megatron (param_mapping.py:905), discarding the per-block scales — the resulting model produces random-looking logits.

static _maybe_dequantize_fp8( weight: torch.Tensor, param_name: str, hf_state_dict: Mapping[str, torch.Tensor], ) → torch.Tensor#: Dequantize weight if it is stored as FP8 with a matching *_scale_inv.

maybe_modify_converted_hf_weight( task: megatron.bridge.models.conversion.model_bridge.WeightConversionTask, converted_weights_dict: Dict[str, torch.Tensor], hf_state_dict: Mapping[str, torch.Tensor], ) → Dict[str, torch.Tensor]#: Add rotary embedding inverse frequency parameter if needed.

bridge.models.deepseek.deepseek_v3_bridge#

Module Contents#

Classes#

Data#

API#

`bridge.models.deepseek.deepseek_v3_bridge`#