bridge.perf_recipes.llama.b200.llama31#
B200 performance recipes for Llama 3.1.
Module Contents#
Functions#
Llama3.1 405B pretrain: 128× B200, BF16, TP=4 PP=8 CP=2. |
|
Llama3.1 405B pretrain: 128× B200, FP8 current-scaling, TP=4 PP=8 CP=2. |
|
Llama3.1 405B pretrain: 128× B200, MXFP8, TP=4 PP=8 CP=2. |
|
Llama3.1 405B pretrain: 128× B200, NVFP4, TP=4 PP=16. |
|
Llama3.1 405B pretrain: 256x B200, NVFP4. |
Data#
API#
- bridge.perf_recipes.llama.b200.llama31.llama31_405b_pretrain_128gpu_b200_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3.1 405B pretrain: 128× B200, BF16, TP=4 PP=8 CP=2.
- bridge.perf_recipes.llama.b200.llama31.llama31_405b_pretrain_128gpu_b200_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3.1 405B pretrain: 128× B200, FP8 current-scaling, TP=4 PP=8 CP=2.
- bridge.perf_recipes.llama.b200.llama31.llama31_405b_pretrain_128gpu_b200_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3.1 405B pretrain: 128× B200, MXFP8, TP=4 PP=8 CP=2.
- bridge.perf_recipes.llama.b200.llama31.llama31_405b_pretrain_128gpu_b200_nvfp4_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3.1 405B pretrain: 128× B200, NVFP4, TP=4 PP=16.
- bridge.perf_recipes.llama.b200.llama31.llama31_405b_pretrain_256gpu_b200_bf16_config#
None
- bridge.perf_recipes.llama.b200.llama31.llama31_405b_pretrain_256gpu_b200_fp8cs_config#
None
- bridge.perf_recipes.llama.b200.llama31.llama31_405b_pretrain_256gpu_b200_fp8mx_config#
None
- bridge.perf_recipes.llama.b200.llama31.llama31_405b_pretrain_256gpu_b200_nvfp4_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#
Llama3.1 405B pretrain: 256x B200, NVFP4.