bridge.recipes.deepseek.deepseek_v4#

Module Contents#

Functions#

set_deepseek_v4_pipeline_model_parallel_layout

Set an even DSv4 pipeline layout with MTP and loss on the last stage.

_deepseek_v4_mxfp8_quant_recipe

Use MXFP8 for training and BF16 for DSv4 validation/evaluation paths.

deepseek_v4_flash_pretrain_config

Return the DeepSeek-V4-Flash Blackwell pre-training base config.

deepseek_v4_flash_pretrain_mxfp8_config

Return the DeepSeek-V4-Flash Adam + MXFP8 pre-training config.

deepseek_v4_flash_pretrain_muon_config

Return the DeepSeek-V4-Flash BF16 Muon pre-training config.

API#

bridge.recipes.deepseek.deepseek_v4.set_deepseek_v4_pipeline_model_parallel_layout(
model_cfg: megatron.bridge.models.GPTModelProvider,
) None#

Set an even DSv4 pipeline layout with MTP and loss on the last stage.

bridge.recipes.deepseek.deepseek_v4._deepseek_v4_mxfp8_quant_recipe() megatron.core.quantization.quant_config.RecipeConfig#

Use MXFP8 for training and BF16 for DSv4 validation/evaluation paths.

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_config() megatron.bridge.training.config.ConfigContainer#

Return the DeepSeek-V4-Flash Blackwell pre-training base config.

Recommended Blackwell baseline: TP=1, PP=4, EP=8, CP=1.

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_mxfp8_config() megatron.bridge.training.config.ConfigContainer#

Return the DeepSeek-V4-Flash Adam + MXFP8 pre-training config.

bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_muon_config() megatron.bridge.training.config.ConfigContainer#

Return the DeepSeek-V4-Flash BF16 Muon pre-training config.