bridge.perf_recipes.llama.h100.llama3#

H100 performance recipes for Llama 3.

Module Contents#

Functions#

llama3_8b_pretrain_8gpu_h100_bf16_config

Llama3 8B pretrain: 8× H100, BF16, CP=2.

llama3_8b_pretrain_8gpu_h100_fp8cs_config

Llama3 8B pretrain: 8× H100, FP8 current-scaling, recompute.

llama3_70b_pretrain_64gpu_h100_bf16_config

Llama3 70B pretrain: 64× H100, BF16, TP=4 PP=4 CP=2, GBS=256.

llama3_70b_pretrain_64gpu_h100_fp8cs_config

Llama3 70B pretrain: 64× H100, FP8 current-scaling, TP=4 PP=8, GBS=256.

llama3_8b_sft_8gpu_h100_bf16_config

Llama3 8B SFT: 8× H100, BF16, seq_length=4096.

llama3_8b_sft_8gpu_h100_fp8cs_config

Llama3 8B SFT: 8× H100, FP8 current-scaling, seq_length=4096.

llama3_70b_sft_32gpu_h100_bf16_config

Llama3 70B SFT: 32× H100, BF16, TP=4 PP=4 VP=5.

llama3_70b_sft_32gpu_h100_fp8cs_config

Llama3 70B SFT: 32× H100, FP8 current-scaling, TP=4 PP=4 VP=5.

llama3_70b_lora_8gpu_h100_bf16_config

Llama3 70B LoRA: 8× H100, BF16, PP=4 VP=20, recompute.

llama3_70b_lora_8gpu_h100_fp8cs_config

Llama3 70B LoRA: 8× H100, FP8 current-scaling, TP=2 PP=4 VP=20.

llama3_8b_pretrain_64gpu_h100_bf16_config

Llama3 8B pretrain: 64× H100, BF16, legacy-scaled GBS.

llama3_8b_pretrain_64gpu_h100_fp8cs_config

Llama3 8B pretrain: 64× H100, FP8 current-scaling, legacy-scaled GBS.

API#

bridge.perf_recipes.llama.h100.llama3.llama3_8b_pretrain_8gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 8B pretrain: 8× H100, BF16, CP=2.

bridge.perf_recipes.llama.h100.llama3.llama3_8b_pretrain_8gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 8B pretrain: 8× H100, FP8 current-scaling, recompute.

bridge.perf_recipes.llama.h100.llama3.llama3_70b_pretrain_64gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 70B pretrain: 64× H100, BF16, TP=4 PP=4 CP=2, GBS=256.

bridge.perf_recipes.llama.h100.llama3.llama3_70b_pretrain_64gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 70B pretrain: 64× H100, FP8 current-scaling, TP=4 PP=8, GBS=256.

bridge.perf_recipes.llama.h100.llama3.llama3_8b_sft_8gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 8B SFT: 8× H100, BF16, seq_length=4096.

bridge.perf_recipes.llama.h100.llama3.llama3_8b_sft_8gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 8B SFT: 8× H100, FP8 current-scaling, seq_length=4096.

bridge.perf_recipes.llama.h100.llama3.llama3_70b_sft_32gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 70B SFT: 32× H100, BF16, TP=4 PP=4 VP=5.

bridge.perf_recipes.llama.h100.llama3.llama3_70b_sft_32gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 70B SFT: 32× H100, FP8 current-scaling, TP=4 PP=4 VP=5.

bridge.perf_recipes.llama.h100.llama3.llama3_70b_lora_8gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 70B LoRA: 8× H100, BF16, PP=4 VP=20, recompute.

bridge.perf_recipes.llama.h100.llama3.llama3_70b_lora_8gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 70B LoRA: 8× H100, FP8 current-scaling, TP=2 PP=4 VP=20.

bridge.perf_recipes.llama.h100.llama3.llama3_8b_pretrain_64gpu_h100_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 8B pretrain: 64× H100, BF16, legacy-scaled GBS.

bridge.perf_recipes.llama.h100.llama3.llama3_8b_pretrain_64gpu_h100_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3 8B pretrain: 64× H100, FP8 current-scaling, legacy-scaled GBS.