bridge.recipes.deepseek.deepseek_v4#
Module Contents#
Functions#
Set an even DSv4 pipeline layout with MTP and loss on the last stage. |
|
Use MXFP8 for training and BF16 for DSv4 validation/evaluation paths. |
|
Return the DeepSeek-V4-Flash Blackwell pre-training base config. |
|
Return the DeepSeek-V4-Flash Adam + MXFP8 pre-training config. |
|
Return the DeepSeek-V4-Flash BF16 Muon pre-training config. |
API#
- bridge.recipes.deepseek.deepseek_v4.set_deepseek_v4_pipeline_model_parallel_layout(
- model_cfg: megatron.bridge.models.GPTModelProvider,
Set an even DSv4 pipeline layout with MTP and loss on the last stage.
- bridge.recipes.deepseek.deepseek_v4._deepseek_v4_mxfp8_quant_recipe() megatron.core.quantization.quant_config.RecipeConfig#
Use MXFP8 for training and BF16 for DSv4 validation/evaluation paths.
- bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_config() megatron.bridge.training.config.ConfigContainer#
Return the DeepSeek-V4-Flash Blackwell pre-training base config.
Recommended Blackwell baseline: TP=1, PP=4, EP=8, CP=1.
- bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_mxfp8_config() megatron.bridge.training.config.ConfigContainer#
Return the DeepSeek-V4-Flash Adam + MXFP8 pre-training config.
- bridge.recipes.deepseek.deepseek_v4.deepseek_v4_flash_pretrain_muon_config() megatron.bridge.training.config.ConfigContainer#
Return the DeepSeek-V4-Flash BF16 Muon pre-training config.