bridge.recipes.deepseek.deepseek_v4#
Module Contents#
Functions#
Use MXFP8 for training and BF16 for DSv4 validation/evaluation paths. |
|
Return the DeepSeek-V4-Flash Blackwell pre-training base config. |
|
Return the DeepSeek-V4-Flash Adam + MXFP8 pre-training config. |
|
Return the DeepSeek-V4-Flash BF16 Muon pre-training config. |
|
DeepSeek-V4-Flash full SFT, MTP enabled, Hopper-safe. |
|
DeepSeek-V4-Flash full SFT with the MTP layer disabled, Hopper-safe. |
Data#
API#
- bridge.recipes.deepseek.deepseek_v4._deepseek_v4_mxfp8_quant_recipe() megatron.core.quantization.quant_config.RecipeConfig#
Use MXFP8 for training and BF16 for DSv4 validation/evaluation paths.
- bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_config() megatron.bridge.training.config.ConfigContainer#
Return the DeepSeek-V4-Flash Blackwell pre-training base config.
Recommended Blackwell baseline: TP=1, PP=4, EP=8, CP=1.
- bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_mxfp8_config() megatron.bridge.training.config.ConfigContainer#
Return the DeepSeek-V4-Flash Adam + MXFP8 pre-training config.
- bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_muon_config() megatron.bridge.training.config.ConfigContainer#
Return the DeepSeek-V4-Flash BF16 Muon pre-training config.
- bridge.recipes.deepseek.deepseek_v4.DEEPSEEK_V4_FLASH_HF_PATH#
‘deepseek-ai/DeepSeek-V4-Flash’
- bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_sft_config(
- hf_path: str = DEEPSEEK_V4_FLASH_HF_PATH,
DeepSeek-V4-Flash full SFT, MTP enabled, Hopper-safe.
Runs unchanged on Hopper (H100/H200) and Blackwell (B200/GB200). Fused mHC is enabled only on Blackwell. Full parameter training on unpacked (SBHD) sequences with Adam/bf16. Set
checkpoint.pretrained_checkpointto the imported Megatron checkpoint to fine-tune real weights;hf_pathoverrides the HF model id (e.g. a toy model in tests).
- bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_no_mtp_sft_config(
- hf_path: str = DEEPSEEK_V4_FLASH_HF_PATH,
DeepSeek-V4-Flash full SFT with the MTP layer disabled, Hopper-safe.
Same as :func:
deepseek_v4_flash_sft_configbut drops the Multi-Token Prediction layer (fused mHC only on Blackwell, bf16, SBHD).