Llama and CodeLlama

Released in February 2023, Meta’s Llama builds on the general transformer decoder framework with some key additions such as pre-normalization, SwiGLU activations, and Rotary Positional Embeddings (RoPE). More information is available in the companion paper “LLaMA: Open and Efficient Foundation Language Models”. With 7B, 13B, 33B, and 65B offerings - Llama has options for every inference budget.

Feature

Status

Data parallelism
Tensor parallelism
Pipeline parallelism
Interleaved Pipeline Parallelism Sched N/A
Sequence parallelism
Selective activation checkpointing
Gradient checkpointing
Partial gradient checkpointing
FP32/TF32
AMP/FP16
BF16
TransformerEngine/FP8
Multi-GPU
Multi-Node
Inference N/A
Slurm
Base Command Manager
Base Command Platform
Distributed data preprcessing
NVfuser
P-Tuning and Prompt Tuning
IA3 and Adapter learning
Distributed Optimizer
Distributed Checkpoint
Fully Shared Data Parallel
Previous Exporting NeMo Models to TensorRT-LLM
Next Data Preparation
© Copyright 2023-2024, NVIDIA. Last updated on May 17, 2024.