`bridge.recipes.deepseek.deepseek_v4`#

Module Contents#

Functions#

`_deepseek_v4_mxfp8_quant_recipe`	Use MXFP8 for training and BF16 for DSv4 validation/evaluation paths.
`deepseek_v4_flash_pretrain_config`	Return the DeepSeek-V4-Flash Blackwell pre-training base config.
`deepseek_v4_flash_pretrain_mxfp8_config`	Return the DeepSeek-V4-Flash Adam + MXFP8 pre-training config.
`deepseek_v4_flash_pretrain_muon_config`	Return the DeepSeek-V4-Flash BF16 Muon pre-training config.
`deepseek_v4_flash_sft_config`	DeepSeek-V4-Flash full SFT, MTP enabled, Hopper-safe.
`deepseek_v4_flash_no_mtp_sft_config`	DeepSeek-V4-Flash full SFT with the MTP layer disabled, Hopper-safe.

Data#

DEEPSEEK_V4_FLASH_HF_PATH

API#

bridge.recipes.deepseek.deepseek_v4._deepseek_v4_mxfp8_quant_recipe() → megatron.core.quantization.quant_config.RecipeConfig#: Use MXFP8 for training and BF16 for DSv4 validation/evaluation paths.

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_config() → megatron.bridge.training.config.ConfigContainer#

Return the DeepSeek-V4-Flash Blackwell pre-training base config.

Recommended Blackwell baseline: TP=1, PP=4, EP=8, CP=1.

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_mxfp8_config() → megatron.bridge.training.config.ConfigContainer#: Return the DeepSeek-V4-Flash Adam + MXFP8 pre-training config.

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_muon_config() → megatron.bridge.training.config.ConfigContainer#: Return the DeepSeek-V4-Flash BF16 Muon pre-training config.

bridge.recipes.deepseek.deepseek_v4.DEEPSEEK_V4_FLASH_HF_PATH#: ‘deepseek-ai/DeepSeek-V4-Flash’

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_sft_config( hf_path: str = DEEPSEEK_V4_FLASH_HF_PATH, ) → megatron.bridge.training.config.ConfigContainer#

DeepSeek-V4-Flash full SFT, MTP enabled, Hopper-safe.

Runs unchanged on Hopper (H100/H200) and Blackwell (B200/GB200). Fused mHC is enabled only on Blackwell. Full parameter training on unpacked (SBHD) sequences with Adam/bf16. Set checkpoint.pretrained_checkpoint to the imported Megatron checkpoint to fine-tune real weights; hf_path overrides the HF model id (e.g. a toy model in tests).

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_no_mtp_sft_config( hf_path: str = DEEPSEEK_V4_FLASH_HF_PATH, ) → megatron.bridge.training.config.ConfigContainer#

DeepSeek-V4-Flash full SFT with the MTP layer disabled, Hopper-safe.

Same as :func:deepseek_v4_flash_sft_config but drops the Multi-Token Prediction layer (fused mHC only on Blackwell, bf16, SBHD).

bridge.recipes.deepseek.deepseek_v4#

Module Contents#

Functions#

Data#

API#

`bridge.recipes.deepseek.deepseek_v4`#