bridge.recipes.deepseek.deepseek_v3#
Module Contents#
Functions#
Set the DeepSeek-V3 pipeline model parallel layout. |
|
Return a pre-training config for DeepSeek-V3 (671B). |
|
Return a pre-training config for DeepSeek-V3 (671B) with minimal nodes (32). |
API#
- bridge.recipes.deepseek.deepseek_v3._build_standalone_mtp_layout(
- num_decoder_layers: int,
- total_stages: int,
- mtp_layers: int,
- bridge.recipes.deepseek.deepseek_v3.set_deepseek_v3_pipeline_model_parallel_layout(
- model_cfg: megatron.bridge.models.GPTModelProvider,
- layout: str | list[list[str]] | None = None,
- *,
- mtp_standalone: bool = False,
Set the DeepSeek-V3 pipeline model parallel layout.
- Parameters:
model_cfg – DeepSeek-V3 model configuration to update.
layout – Explicit pipeline layout. When provided, this overrides the predefined layouts.
mtp_standalone – Place MTP layers in a standalone penultimate PP/VPP stage and loss in the final stage. Defaults to colocating MTP with loss, matching existing recipes.
- bridge.recipes.deepseek.deepseek_v3.deepseek_v3_pretrain_config() megatron.bridge.training.config.ConfigContainer#
Return a pre-training config for DeepSeek-V3 (671B).
Recommended parallelism: TP=2, PP=16, EP=64.
- bridge.recipes.deepseek.deepseek_v3.deepseek_v3_pretrain_config_32nodes() megatron.bridge.training.config.ConfigContainer#
Return a pre-training config for DeepSeek-V3 (671B) with minimal nodes (32).
Recommended parallelism: TP=2, PP=8, EP=32. Uses full recompute for memory efficiency.