bridge.perf_recipes.llama.b300.llama3#
B300 performance recipes for Llama 3.
Module Contents#
Functions#
Llama3 8B pretrain: 8× B300, BF16, CUDA graph local. |
|
Llama3 8B pretrain: 8× B300, FP8 current-scaling, CUDA graph local. |
|
Llama3 8B pretrain: 8× B300, MXFP8, CUDA graph local. |
|
Llama3 8B pretrain: 8× B300, NVFP4. |
|
Llama3 70B pretrain: 64× B300, BF16, FSDP, GBS=256. |
|
Llama3 70B pretrain: 64× B300, FP8 current-scaling, FSDP, GBS=256. |
|
Llama3 70B pretrain: 64× B300, MXFP8, PP=4, GBS=256. |
|
Llama3 70B pretrain: 64× B300, NVFP4, PP=4, GBS=256. |
|
Llama3 70B LoRA: 8× B300, BF16. |
|
Llama3 70B LoRA: 8× B300, FP8 current-scaling, PP=2. |
|
Llama3 70B LoRA: 8× B300, MXFP8, PP=2. |
API#
- bridge.perf_recipes.llama.b300.llama3.llama3_8b_pretrain_8gpu_b300_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 8× B300, BF16, CUDA graph local.
- bridge.perf_recipes.llama.b300.llama3.llama3_8b_pretrain_8gpu_b300_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 8× B300, FP8 current-scaling, CUDA graph local.
- bridge.perf_recipes.llama.b300.llama3.llama3_8b_pretrain_8gpu_b300_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 8× B300, MXFP8, CUDA graph local.
- bridge.perf_recipes.llama.b300.llama3.llama3_8b_pretrain_8gpu_b300_nvfp4_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 8B pretrain: 8× B300, NVFP4.
- bridge.perf_recipes.llama.b300.llama3.llama3_70b_pretrain_64gpu_b300_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 64× B300, BF16, FSDP, GBS=256.
- bridge.perf_recipes.llama.b300.llama3.llama3_70b_pretrain_64gpu_b300_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 64× B300, FP8 current-scaling, FSDP, GBS=256.
- bridge.perf_recipes.llama.b300.llama3.llama3_70b_pretrain_64gpu_b300_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 64× B300, MXFP8, PP=4, GBS=256.
- bridge.perf_recipes.llama.b300.llama3.llama3_70b_pretrain_64gpu_b300_nvfp4_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B pretrain: 64× B300, NVFP4, PP=4, GBS=256.
- bridge.perf_recipes.llama.b300.llama3.llama3_70b_lora_8gpu_b300_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B LoRA: 8× B300, BF16.
- bridge.perf_recipes.llama.b300.llama3.llama3_70b_lora_8gpu_b300_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B LoRA: 8× B300, FP8 current-scaling, PP=2.
- bridge.perf_recipes.llama.b300.llama3.llama3_70b_lora_8gpu_b300_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3 70B LoRA: 8× B300, MXFP8, PP=2.