Llama
Meta’s Llama is a family of open-weight autoregressive language models built on the transformer decoder architecture. Key design choices include pre-normalization with RMSNorm, SwiGLU activations, and Rotary Positional Embeddings (RoPE). Llama 3+ models add Grouped Query Attention (GQA) for memory-efficient inference at larger scales.
Available Models
- Llama 3.2: 1B, 3B
- Llama 3.1: 8B, 70B, 405B (128K context)
- Llama 3: 8B, 70B
- Llama 2: 7B, 13B, 70B
- LLaMA (v1): 7B, 13B, 30B, 65B
- Yi (01-ai): 6B, 34B — uses
LlamaForCausalLM
Architecture
LlamaForCausalLM
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and LLM Fine-Tuning Guide.
Fine-Tuning
See the LLM Fine-Tuning Guide for full SFT and LoRA instructions.