bridge.recipes.deepseek.deepseek_v3#

Module Contents#

Functions#

_build_standalone_mtp_layout

set_deepseek_v3_pipeline_model_parallel_layout

Set the DeepSeek-V3 pipeline model parallel layout.

deepseek_v3_pretrain_config

Return a pre-training config for DeepSeek-V3 (671B).

deepseek_v3_pretrain_config_32nodes

Return a pre-training config for DeepSeek-V3 (671B) with minimal nodes (32).

API#

bridge.recipes.deepseek.deepseek_v3._build_standalone_mtp_layout(
num_decoder_layers: int,
total_stages: int,
mtp_layers: int,
) list[list[str]]#
bridge.recipes.deepseek.deepseek_v3.set_deepseek_v3_pipeline_model_parallel_layout(
model_cfg: megatron.bridge.models.GPTModelProvider,
layout: str | list[list[str]] | None = None,
*,
mtp_standalone: bool = False,
) None#

Set the DeepSeek-V3 pipeline model parallel layout.

Parameters:
  • model_cfg – DeepSeek-V3 model configuration to update.

  • layout – Explicit pipeline layout. When provided, this overrides the predefined layouts.

  • mtp_standalone – Place MTP layers in a standalone penultimate PP/VPP stage and loss in the final stage. Defaults to colocating MTP with loss, matching existing recipes.

bridge.recipes.deepseek.deepseek_v3.deepseek_v3_pretrain_config() megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for DeepSeek-V3 (671B).

Recommended parallelism: TP=2, PP=16, EP=64.

bridge.recipes.deepseek.deepseek_v3.deepseek_v3_pretrain_config_32nodes() megatron.bridge.training.config.ConfigContainer#

Return a pre-training config for DeepSeek-V3 (671B) with minimal nodes (32).

Recommended parallelism: TP=2, PP=8, EP=32. Uses full recompute for memory efficiency.