bridge.perf_recipes.llama.h100.llama3#
H100 performance recipes for Llama 3.
Module Contents#
Functions#
Llama3 8B pretrain: 8× H100, BF16, CP=2. |
|
Llama3 8B pretrain: 8× H100, FP8 current-scaling, recompute. |
|
Llama3 70B pretrain: 64× H100, BF16, TP=4 PP=4 CP=2, GBS=256. |
|
Llama3 70B pretrain: 64× H100, FP8 current-scaling, TP=4 PP=8, GBS=256. |
|
Llama3 8B SFT: 8× H100, BF16, seq_length=4096. |
|
Llama3 8B SFT: 8× H100, FP8 current-scaling, seq_length=4096. |
|
Llama3 70B SFT: 32× H100, BF16, TP=4 PP=4 VP=5. |
|
Llama3 70B SFT: 32× H100, FP8 current-scaling, TP=4 PP=4 VP=5. |
|
Llama3 70B LoRA: 8× H100, BF16, PP=4 VP=20, recompute. |
|
Llama3 70B LoRA: 8× H100, FP8 current-scaling, TP=2 PP=4 VP=20. |
|
Llama3 8B pretrain: 64× H100, BF16, legacy-scaled GBS. |
|
Llama3 8B pretrain: 64× H100, FP8 current-scaling, legacy-scaled GBS. |
API#
- bridge.perf_recipes.llama.h100.llama3.llama3_8b_pretrain_8gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 8× H100, BF16, CP=2.
- bridge.perf_recipes.llama.h100.llama3.llama3_8b_pretrain_8gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 8× H100, FP8 current-scaling, recompute.
- bridge.perf_recipes.llama.h100.llama3.llama3_70b_pretrain_64gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 64× H100, BF16, TP=4 PP=4 CP=2, GBS=256.
- bridge.perf_recipes.llama.h100.llama3.llama3_70b_pretrain_64gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 64× H100, FP8 current-scaling, TP=4 PP=8, GBS=256.
- bridge.perf_recipes.llama.h100.llama3.llama3_8b_sft_8gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B SFT: 8× H100, BF16, seq_length=4096.
- bridge.perf_recipes.llama.h100.llama3.llama3_8b_sft_8gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B SFT: 8× H100, FP8 current-scaling, seq_length=4096.
- bridge.perf_recipes.llama.h100.llama3.llama3_70b_sft_32gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B SFT: 32× H100, BF16, TP=4 PP=4 VP=5.
- bridge.perf_recipes.llama.h100.llama3.llama3_70b_sft_32gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B SFT: 32× H100, FP8 current-scaling, TP=4 PP=4 VP=5.
- bridge.perf_recipes.llama.h100.llama3.llama3_70b_lora_8gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B LoRA: 8× H100, BF16, PP=4 VP=20, recompute.
- bridge.perf_recipes.llama.h100.llama3.llama3_70b_lora_8gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B LoRA: 8× H100, FP8 current-scaling, TP=2 PP=4 VP=20.
- bridge.perf_recipes.llama.h100.llama3.llama3_8b_pretrain_64gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 64× H100, BF16, legacy-scaled GBS.
- bridge.perf_recipes.llama.h100.llama3.llama3_8b_pretrain_64gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 64× H100, FP8 current-scaling, legacy-scaled GBS.