bridge.models.deepseek.deepseek_v3_bridge#

Module Contents#

Classes#

DeepSeekV3Bridge

Megatron Bridge for DeepSeek-V3.

API#

class bridge.models.deepseek.deepseek_v3_bridge.DeepSeekV3Bridge#

Bases: megatron.bridge.models.conversion.model_bridge.MegatronModelBridge

Megatron Bridge for DeepSeek-V3.

provider_bridge(
hf_pretrained: megatron.bridge.models.hf_pretrained.causal_lm.PreTrainedCausalLM,
) megatron.bridge.models.mla_provider.MLAModelProvider#
mapping_registry() megatron.bridge.models.conversion.mapping_registry.MegatronMappingRegistry#
maybe_modify_converted_hf_weight(
task: megatron.bridge.models.conversion.model_bridge.WeightConversionTask,
converted_weights_dict: Dict[str, torch.Tensor],
hf_state_dict: Mapping[str, torch.Tensor],
) Dict[str, torch.Tensor]#

Add rotary embedding inverse frequency parameter if needed.