bridge.perf_recipes.llama.gb300.llama31#

GB300 performance recipes for Llama 3.1.

Module Contents#

Functions#

llama31_405b_pretrain_128gpu_gb300_bf16_config

Llama3.1 405B pretrain: 128× GB300, BF16, FSDP.

llama31_405b_pretrain_128gpu_gb300_fp8cs_config

Llama3.1 405B pretrain: 128× GB300, FP8 current-scaling, FSDP.

llama31_405b_pretrain_128gpu_gb300_fp8mx_config

Llama3.1 405B pretrain: 128× GB300, MXFP8, TP=4 PP=8 CP=2.

llama31_405b_pretrain_128gpu_gb300_nvfp4_config

Llama3.1 405B pretrain: 128× GB300, NVFP4, TP=4 PP=8.

llama31_405b_pretrain_256gpu_gb300_bf16_config

Llama3.1 405B pretrain: 256× GB300, BF16, FSDP.

llama31_405b_pretrain_256gpu_gb300_fp8cs_config

Llama3.1 405B pretrain: 256× GB300, FP8 current-scaling, TP=4 PP=8.

llama31_405b_pretrain_256gpu_gb300_fp8mx_config

Llama3.1 405B pretrain: 256× GB300, MXFP8, TP=2 PP=8 CP=2.

llama31_405b_pretrain_256gpu_gb300_nvfp4_config

Llama3.1 405B pretrain: 256× GB300, NVFP4, TP=4 PP=8.

API#

bridge.perf_recipes.llama.gb300.llama31.llama31_405b_pretrain_128gpu_gb300_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3.1 405B pretrain: 128× GB300, BF16, FSDP.

bridge.perf_recipes.llama.gb300.llama31.llama31_405b_pretrain_128gpu_gb300_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3.1 405B pretrain: 128× GB300, FP8 current-scaling, FSDP.

bridge.perf_recipes.llama.gb300.llama31.llama31_405b_pretrain_128gpu_gb300_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3.1 405B pretrain: 128× GB300, MXFP8, TP=4 PP=8 CP=2.

bridge.perf_recipes.llama.gb300.llama31.llama31_405b_pretrain_128gpu_gb300_nvfp4_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3.1 405B pretrain: 128× GB300, NVFP4, TP=4 PP=8.

bridge.perf_recipes.llama.gb300.llama31.llama31_405b_pretrain_256gpu_gb300_bf16_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3.1 405B pretrain: 256× GB300, BF16, FSDP.

bridge.perf_recipes.llama.gb300.llama31.llama31_405b_pretrain_256gpu_gb300_fp8cs_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3.1 405B pretrain: 256× GB300, FP8 current-scaling, TP=4 PP=8.

bridge.perf_recipes.llama.gb300.llama31.llama31_405b_pretrain_256gpu_gb300_fp8mx_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3.1 405B pretrain: 256× GB300, MXFP8, TP=2 PP=8 CP=2.

bridge.perf_recipes.llama.gb300.llama31.llama31_405b_pretrain_256gpu_gb300_nvfp4_config() megatron.bridge.perf_recipes.llama.common.ConfigContainer#

Llama3.1 405B pretrain: 256× GB300, NVFP4, TP=4 PP=8.