bridge.recipes.deepseek.deepseek_v3#

Module Contents#

Functions#

set_deepseek_v3_pipeline_model_parallel_layout

Set the DeepSeek-V3 pipeline model parallel layout.

deepseek_v3_pretrain_config

Return a pre-training config for DeepSeek-V3 (671B).

deepseek_v3_pretrain_config_32nodes

Return a pre-training config for DeepSeek-V3 (671B) with minimal nodes (32).

API#

bridge.recipes.deepseek.deepseek_v3.set_deepseek_v3_pipeline_model_parallel_layout(
model_cfg: megatron.bridge.models.GPTModelProvider,
layout: Optional[Union[str, List[List[str]]]] = None,
) None#

Set the DeepSeek-V3 pipeline model parallel layout.

bridge.recipes.deepseek.deepseek_v3.deepseek_v3_pretrain_config() megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for DeepSeek-V3 (671B).

Recommended parallelism: TP=2, PP=16, EP=64.

bridge.recipes.deepseek.deepseek_v3.deepseek_v3_pretrain_config_32nodes() megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for DeepSeek-V3 (671B) with minimal nodes (32).

Recommended parallelism: TP=2, PP=8, EP=32. Uses full recompute for memory efficiency.