Llama Models

This page provides detailed technical specifications for the Llama model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.

Llama-3.2-3B Instruct

Property	Value
Creator	Meta
Architecture	transformer
Description	Llama-3.2-3B is a compact yet powerful language model suitable for various dialogue applications.
Max I/O Tokens	8192
Parameters	3 billion
Training Data	15+ trillion tokens (up to 2024)
Default Name	meta-llama/Llama-3.2-3B-Instruct
Hugging Face	meta-llama/Llama-3.2-3B-Instruct

Training Options

LoRA: 1x 80GB GPU, tensor parallel size 1
Full SFT: 4x 80GB GPU, tensor parallel size 2

Deployment Configuration

LoRA:
NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
GPU Count: 1x 80GB
Full SFT:
NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
GPU Count: 1x 80GB
Additional Environment Variables:
NIM_MODEL_PROFILE: vllm

Llama-3.2-1B Instruct

Property	Value
Creator	Meta
Architecture	transformer
Description	Llama-3.2-1B is a lightweight language model designed for efficient deployment while maintaining strong capabilities.
Max I/O Tokens	8192
Parameters	1 billion
Training Data	15+ trillion tokens (up to 2024)
Default Name	meta-llama/Llama-3.2-1B-Instruct
Hugging Face	meta-llama/Llama-3.2-1B-Instruct

Training Options

LoRA: 1x 80GB GPU, tensor parallel size 1
Full SFT: 1x 80GB GPU, tensor parallel size 1

Deployment Configuration

LoRA:
NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
GPU Count: 1x 80GB
Full SFT:
NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
GPU Count: 1x 80GB
Additional Environment Variables:
NIM_MODEL_PROFILE: vllm

Llama-3.1-8B Instruct

Property	Value
Creator	Meta
Architecture	transformer
Description	Llama-3.1-8B is a large language AI model optimized for multilingual dialogue uses.
Max I/O Tokens	8192
Parameters	8 billion
Training Data	15 trillion tokens (up to December 2023)
Default Name	meta-llama/Llama-3.1-8B-Instruct
Hugging Face	meta-llama/Llama-3.1-8B-Instruct

Training Options

LoRA: 1x 80GB GPU, tensor parallel size 1
Full SFT: 8x 80GB GPU, tensor parallel size 4

Deployment Configuration

LoRA:
NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
GPU Count: 1x 80GB
Full SFT:
NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
GPU Count: 8x 80GB
Additional Environment Variables:
NIM_MODEL_PROFILE: vllm