Llama 3#
Meta’s Llama builds on the general transformer decoder framework with some key additions such as pre-normalization, SwiGLU activations, and Rotary Positional Embeddings (RoPE). More information is available in the companion paper “Llama: Open and Efficient Foundation Language Models”. With a wide variety of model sizes - Llama has options for every inference budget.
Llama family models are supported via the Bridge system with auto-detected configuration and weight mapping.
Available Models#
Megatron Bridge supports the following Llama model variants:
Llama 3.2: 1B, 3B
Llama 3: 8B, 70B (with 8K, 16K, 64K, 128K context variants)
Llama 3.1: 8B, 70B, 405B (with 128K context length)
All models support both pretraining and finetuning with full parameter updates or PEFT methods (LoRA, DoRA).
Model Architecture Features#
Pre-normalization: RMSNorm before each transformer sub-layer for training stability
SwiGLU Activation: Gated linear units in the feedforward network
Rotary Positional Embeddings (RoPE): Relative position encoding via rotation matrices
Grouped Query Attention (GQA): Memory-efficient attention mechanism (70B+ models)
Extended Context: Native support for long sequences up to 128K tokens (Llama 3.1)
Conversion with 🤗 Hugging Face#
Load HF → Megatron#
from megatron.bridge import AutoBridge
# Example: Llama 3.1 8B
bridge = AutoBridge.from_hf_pretrained("meta-llama/Meta-Llama-3.1-8B")
provider = bridge.to_megatron_provider()
# Optionally configure parallelism before instantiating the model
provider.tensor_model_parallel_size = 2
provider.pipeline_model_parallel_size = 1
model = provider.provide_distributed_model(wrap_with_ddp=False)
Import Checkpoint from HF#
python examples/conversion/convert_checkpoints.py import \
--hf-model meta-llama/Meta-Llama-3.1-8B \
--megatron-path /checkpoints/llama31_8b_megatron
Export Megatron → HF#
from megatron.bridge import AutoBridge
# Load the bridge from HF model ID
bridge = AutoBridge.from_hf_pretrained("meta-llama/Meta-Llama-3.1-8B")
# Export a trained/finetuned Megatron checkpoint to HF format
bridge.export_ckpt(
megatron_path="/results/llama31_8b/checkpoints/iter_0000500",
hf_path="/exports/llama31_8b_hf",
)
Run Inference on Converted Checkpoint#
python examples/conversion/hf_to_megatron_generate_text.py \
--hf_model_path meta-llama/Meta-Llama-3.1-8B \
--megatron_model_path /checkpoints/llama31_8b_megatron \
--prompt "What is artificial intelligence?" \
--max_new_tokens 100 \
--tp 2
For more details, see examples/conversion/hf_to_megatron_generate_text.py
Recipes#
See: bridge.recipes.llama.llama3
Available Recipes#
Pretrain recipes:
llama32_1b_pretrain_config,llama32_3b_pretrain_config: Llama 3.2 (1B, 3B)llama3_8b_pretrain_config: Llama 3 8B with 8K contextllama3_8b_16k_pretrain_config,llama3_8b_64k_pretrain_config,llama3_8b_128k_pretrain_config: Llama 3 8B with extended context (16K/64K/128K)llama3_8b_low_precision_pretrain_config: Llama 3 8B with low precision (FP8/MXFP8/NVFP4)llama3_70b_pretrain_config,llama3_70b_16k_pretrain_config,llama3_70b_64k_pretrain_config: Llama 3 70B (8K/16K/64K context)llama31_8b_pretrain_config,llama31_70b_pretrain_config,llama31_405b_pretrain_config: Llama 3.1 (8B/70B/405B, 128K context)
Finetune recipes:
llama32_1b_finetune_config,llama32_3b_finetune_config: Llama 3.2 with PEFT supportllama3_8b_finetune_config,llama31_8b_finetune_config: Llama 3/3.1 8B with PEFT supportllama3_70b_finetune_config,llama31_70b_finetune_config: Llama 3/3.1 70B with PEFT supportllama31_405b_finetune_config: Llama 3.1 405B with PEFT support
Parallelism Configurations#
Llama 3.2 (1B, 3B)#
Model |
Mode |
TP |
PP |
Total GPUs |
Use Case |
|---|---|---|---|---|---|
1B / 3B |
Pretrain |
1 |
1 |
8 |
Pre-training (single node) |
1B / 3B |
Full SFT |
1 |
1 |
8 |
Full supervised finetuning |
1B / 3B |
LoRA/DoRA |
1 |
1 |
8 |
PEFT finetuning |
Llama 3 / 3.1 (8B)#
Model |
Mode |
TP |
PP |
CP |
Total GPUs |
Use Case |
|---|---|---|---|---|---|---|
8B |
Pretrain |
1 |
1 |
2 |
16 |
Pre-training |
8B |
Full SFT |
2 |
1 |
1 |
16 |
Full supervised finetuning |
8B |
LoRA/DoRA |
1 |
1 |
1 |
8 |
PEFT finetuning (single node) |
Llama 3 / 3.1 (70B)#
Model |
Mode |
TP |
PP |
VP |
CP |
Total GPUs |
Use Case |
|---|---|---|---|---|---|---|---|
70B |
Pretrain |
4 |
4 |
5 |
2 |
64 |
Pre-training |
70B |
Full SFT |
8 |
4 |
- |
1 |
256 |
Full supervised finetuning (32 nodes) |
70B |
LoRA/DoRA |
8 |
1 |
- |
1 |
8 |
PEFT finetuning (single node!) |
Llama 3.1 (405B)#
Model |
Mode |
TP |
PP |
VP |
CP |
Total GPUs |
Use Case |
|---|---|---|---|---|---|---|---|
405B |
Pretrain |
8 |
8 |
2 |
4 |
512 |
Pre-training (64 nodes) |
405B |
Full SFT |
8 |
16 |
- |
1 |
2048 |
Full supervised finetuning (256 nodes) |
405B |
LoRA/DoRA |
4 |
8 |
8 |
1 |
256 |
PEFT finetuning (32 nodes) |
Key Features:
Context Parallelism: Enabled for long context training (16K/64K/128K variants)
Sequence Parallel: Enabled by default for larger models (70B+) for memory efficiency
Low Precision Training: FP8, MXFP8, NVFP4 options available for 8B model
Virtual Pipeline: VP parallelism for 70B and 405B models
Pre-training Example#
from megatron.bridge.recipes.llama import llama3_8b_pretrain_config
config = llama3_8b_pretrain_config(
name="llama3_8b_pretrain",
data_paths=["/path/to/dataset.nvjsonl"],
dir="/results/llama3_8b",
train_iters=500_000,
global_batch_size=512,
seq_length=8192,
# Uses TP=1, PP=1, CP=2 (16 GPUs) automatically
)
Finetuning Examples#
Before finetuning, ensure these environment variables are set:
SAVE_DIR: checkpoint and log saving directoryHF_TOKEN: to download models from HF Hub (if required)HF_HOME: (optional) to avoid re-downloading models and datasetsWANDB_API_KEY: (optional) to enable WandB logging
Full Finetuning (Llama 3 8B)#
from megatron.bridge.recipes.llama import llama3_8b_finetune_config
cfg = llama3_8b_finetune_config(
name="llama3_8b_full_sft",
pretrained_checkpoint="/results/llama3_8b/checkpoints/iter_0500000",
peft=None, # Full supervised finetuning
train_iters=1000,
global_batch_size=64,
finetune_lr=5e-6,
# Uses TP=2, PP=1 (16 GPUs) automatically
)
LoRA Finetuning#
from megatron.bridge.recipes.llama import llama3_8b_finetune_config
cfg = llama3_8b_finetune_config(
name="llama3_8b_lora",
pretrained_checkpoint="/results/llama3_8b/checkpoints/iter_0500000",
peft="lora", # or "dora" for DoRA
train_iters=1000,
global_batch_size=128,
finetune_lr=1e-4,
# Uses TP=1, PP=1 (8 GPUs) automatically
)
LoRA Finetuning#
from megatron.bridge.recipes.llama import llama3_70b_finetune_config
cfg = llama3_70b_finetune_config(
name="llama3_70b_lora",
pretrained_checkpoint="/results/llama3_70b/checkpoints/iter_0500000",
peft="lora",
train_iters=1000,
global_batch_size=128,
finetune_lr=1e-4,
# Uses TP=8, PP=1 (8 GPUs) automatically
)
Hugging Face Model Cards & References#
Hugging Face Model Cards#
Llama 3.2 1B: https://huggingface.co/meta-llama/Llama-3.2-1B
Llama 3.2 3B: https://huggingface.co/meta-llama/Llama-3.2-3B
Llama 3 8B: https://huggingface.co/meta-llama/Meta-Llama-3-8B
Llama 3 70B: https://huggingface.co/meta-llama/Meta-Llama-3-70B
Llama 3.1 8B: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B
Llama 3.1 70B: https://huggingface.co/meta-llama/Meta-Llama-3.1-70B
Llama 3.1 405B: https://huggingface.co/meta-llama/Meta-Llama-3.1-405B
Technical Papers#
Llama: Open and Efficient Foundation Language Models: arXiv:2302.13971
The Llama 3 Herd of Models: arXiv:2407.21783