bridge.perf_recipes.llama.gb200.llama3#
GB200 performance recipes for Llama 3.
Module Contents#
Functions#
Llama3 8B pretrain: 8× GB200, BF16, CUDA graph local. |
|
Llama3 8B pretrain: 8× GB200, FP8 current-scaling, CUDA graph local. |
|
Llama3 8B pretrain: 8× GB200, MXFP8, CUDA graph local. |
|
Llama3 8B pretrain: 8× GB200, NVFP4. |
|
Llama3 70B pretrain: 64× GB200, BF16, FSDP, GBS=256. |
|
Llama3 70B pretrain: 64× GB200, FP8 current-scaling, FSDP, GBS=256. |
|
Llama3 70B pretrain: 64× GB200, MXFP8, TP=2 PP=4, GBS=256. |
|
Llama3 70B pretrain: 64× GB200, NVFP4, TP=2 PP=4, GBS=256. |
|
Llama3 8B SFT: 8× GB200, BF16, seq_length=16384. |
|
Llama3 8B SFT: 8× GB200, FP8 current-scaling, seq_length=16384. |
|
Llama3 70B SFT: 32× GB200, BF16, PP=8 VP=10. |
|
Llama3 70B SFT: 32× GB200, FP8 current-scaling, PP=8 VP=10. |
|
Llama3 70B LoRA: 8× GB200, BF16, GBS=64, seq_length=2048. |
|
Llama3 70B LoRA: 8× GB200, FP8 current-scaling, PP=2. |
|
Llama3 70B LoRA: 8× GB200, MXFP8, PP=2. |
|
LLaMA 3 8B SFT: 8× GB200, FP8-MX (same layout as FP8-CS). |
|
LLaMA 3 70B SFT: 32× GB200, FP8-MX (same layout as FP8-CS). |
|
Llama3 8B pretrain: 32× GB200, BF16, legacy-scaled GBS. |
|
Llama3 8B pretrain: 32× GB200, FP8 current-scaling, legacy-scaled GBS. |
|
Llama3 70B pretrain: 32× GB200, BF16, legacy-scaled GBS. |
|
Llama3 70B pretrain: 32× GB200, FP8 current-scaling, legacy-scaled GBS. |
API#
- bridge.perf_recipes.llama.gb200.llama3.llama3_8b_pretrain_8gpu_gb200_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 8× GB200, BF16, CUDA graph local.
- bridge.perf_recipes.llama.gb200.llama3.llama3_8b_pretrain_8gpu_gb200_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 8× GB200, FP8 current-scaling, CUDA graph local.
- bridge.perf_recipes.llama.gb200.llama3.llama3_8b_pretrain_8gpu_gb200_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 8× GB200, MXFP8, CUDA graph local.
- bridge.perf_recipes.llama.gb200.llama3.llama3_8b_pretrain_8gpu_gb200_nvfp4_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 8× GB200, NVFP4.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_pretrain_64gpu_gb200_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 64× GB200, BF16, FSDP, GBS=256.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_pretrain_64gpu_gb200_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 64× GB200, FP8 current-scaling, FSDP, GBS=256.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_pretrain_64gpu_gb200_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 64× GB200, MXFP8, TP=2 PP=4, GBS=256.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_pretrain_64gpu_gb200_nvfp4_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 64× GB200, NVFP4, TP=2 PP=4, GBS=256.
- bridge.perf_recipes.llama.gb200.llama3.llama3_8b_sft_8gpu_gb200_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B SFT: 8× GB200, BF16, seq_length=16384.
- bridge.perf_recipes.llama.gb200.llama3.llama3_8b_sft_8gpu_gb200_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B SFT: 8× GB200, FP8 current-scaling, seq_length=16384.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_sft_32gpu_gb200_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B SFT: 32× GB200, BF16, PP=8 VP=10.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_sft_32gpu_gb200_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B SFT: 32× GB200, FP8 current-scaling, PP=8 VP=10.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_lora_8gpu_gb200_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B LoRA: 8× GB200, BF16, GBS=64, seq_length=2048.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_lora_8gpu_gb200_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B LoRA: 8× GB200, FP8 current-scaling, PP=2.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_lora_8gpu_gb200_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B LoRA: 8× GB200, MXFP8, PP=2.
- bridge.perf_recipes.llama.gb200.llama3.llama3_8b_sft_8gpu_gb200_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
LLaMA 3 8B SFT: 8× GB200, FP8-MX (same layout as FP8-CS).
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_sft_32gpu_gb200_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
LLaMA 3 70B SFT: 32× GB200, FP8-MX (same layout as FP8-CS).
- bridge.perf_recipes.llama.gb200.llama3.llama3_8b_pretrain_32gpu_gb200_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 32× GB200, BF16, legacy-scaled GBS.
- bridge.perf_recipes.llama.gb200.llama3.llama3_8b_pretrain_32gpu_gb200_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 32× GB200, FP8 current-scaling, legacy-scaled GBS.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_pretrain_32gpu_gb200_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 32× GB200, BF16, legacy-scaled GBS.
- bridge.perf_recipes.llama.gb200.llama3.llama3_70b_pretrain_32gpu_gb200_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 32× GB200, FP8 current-scaling, legacy-scaled GBS.