Llama 3#

Meta’s Llama builds on the general transformer decoder framework with some key additions such as pre-normalization, SwiGLU activations, and Rotary Positional Embeddings (RoPE). More information is available in the companion paper “Llama: Open and Efficient Foundation Language Models”. With a wide variety of model sizes - Llama has options for every inference budget.

Llama family models are supported via the Bridge system with auto-detected configuration and weight mapping.

Conversion with 🤗 Hugging Face#

Load HF → Megatron#

from megatron.bridge import AutoBridge

# Example: Llama 3.1 8B
bridge = AutoBridge.from_hf_pretrained("meta-llama/Llama-3.1-8B-Instruct")
provider = bridge.to_megatron_provider()

# Configure parallelism before instantiating the model
provider.tensor_model_parallel_size = 8
provider.pipeline_model_parallel_size = 1

model = provider.provide_distributed_model(wrap_with_ddp=False)

Export Megatron → HF#

# Convert from a Megatron checkpoint directory to HF format
bridge.export_ckpt(
    megatron_path="/results/llama3_8b/checkpoints/iter_00002000",
    hf_path="./llama-hf-export",
)

Examples#

Pretrain recipes#