bridge.recipes.deepseek.deepseek_v4#

Module Contents#

Functions#

_deepseek_v4_mxfp8_quant_recipe

Use MXFP8 for training and BF16 for DSv4 validation/evaluation paths.

deepseek_v4_flash_pretrain_config

Return the DeepSeek-V4-Flash Blackwell pre-training base config.

deepseek_v4_flash_pretrain_mxfp8_config

Return the DeepSeek-V4-Flash Adam + MXFP8 pre-training config.

deepseek_v4_flash_pretrain_muon_config

Return the DeepSeek-V4-Flash BF16 Muon pre-training config.

deepseek_v4_flash_sft_config

DeepSeek-V4-Flash full SFT, MTP enabled, Hopper-safe.

deepseek_v4_flash_no_mtp_sft_config

DeepSeek-V4-Flash full SFT with the MTP layer disabled, Hopper-safe.

Data#

API#

bridge.recipes.deepseek.deepseek_v4._deepseek_v4_mxfp8_quant_recipe() megatron.core.quantization.quant_config.RecipeConfig#

Use MXFP8 for training and BF16 for DSv4 validation/evaluation paths.

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_config() megatron.bridge.training.config.ConfigContainer#

Return the DeepSeek-V4-Flash Blackwell pre-training base config.

Recommended Blackwell baseline: TP=1, PP=4, EP=8, CP=1.

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_mxfp8_config() megatron.bridge.training.config.ConfigContainer#

Return the DeepSeek-V4-Flash Adam + MXFP8 pre-training config.

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_muon_config() megatron.bridge.training.config.ConfigContainer#

Return the DeepSeek-V4-Flash BF16 Muon pre-training config.

bridge.recipes.deepseek.deepseek_v4.DEEPSEEK_V4_FLASH_HF_PATH#

‘deepseek-ai/DeepSeek-V4-Flash’

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_sft_config(
hf_path: str = DEEPSEEK_V4_FLASH_HF_PATH,
) megatron.bridge.training.config.ConfigContainer#

DeepSeek-V4-Flash full SFT, MTP enabled, Hopper-safe.

Runs unchanged on Hopper (H100/H200) and Blackwell (B200/GB200). Fused mHC is enabled only on Blackwell. Full parameter training on unpacked (SBHD) sequences with Adam/bf16. Set checkpoint.pretrained_checkpoint to the imported Megatron checkpoint to fine-tune real weights; hf_path overrides the HF model id (e.g. a toy model in tests).

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_no_mtp_sft_config(
hf_path: str = DEEPSEEK_V4_FLASH_HF_PATH,
) megatron.bridge.training.config.ConfigContainer#

DeepSeek-V4-Flash full SFT with the MTP layer disabled, Hopper-safe.

Same as :func:deepseek_v4_flash_sft_config but drops the Multi-Token Prediction layer (fused mHC only on Blackwell, bf16, SBHD).