bridge.recipes.deepseek.deepseek_v3#
Module Contents#
Functions#
Set the DeepSeek-V3 pipeline model parallel layout. |
|
Return a pre-training config for DeepSeek-V3 (671B). |
|
Return a pre-training config for DeepSeek-V3 (671B) with minimal nodes (32). |
API#
- bridge.recipes.deepseek.deepseek_v3.set_deepseek_v3_pipeline_model_parallel_layout(
- model_cfg: megatron.bridge.models.GPTModelProvider,
- layout: Optional[Union[str, List[List[str]]]] = None,
Set the DeepSeek-V3 pipeline model parallel layout.
- bridge.recipes.deepseek.deepseek_v3.deepseek_v3_pretrain_config() megatron.bridge.training.config.ConfigContainer#
Return a pre-training config for DeepSeek-V3 (671B).
Recommended parallelism: TP=2, PP=16, EP=64.
- bridge.recipes.deepseek.deepseek_v3.deepseek_v3_pretrain_config_32nodes() megatron.bridge.training.config.ConfigContainer#
Return a pre-training config for DeepSeek-V3 (671B) with minimal nodes (32).
Recommended parallelism: TP=2, PP=8, EP=32. Uses full recompute for memory efficiency.