Llama Models#

This page provides detailed technical specifications for the Llama model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.

Llama-3.2-3B Instruct#

Property	Value
Creator	Meta
Architecture	transformer
Description	Llama-3.2-3B is a compact yet powerful language model suitable for various dialogue applications.
Max I/O Tokens	8192
Parameters	3 billion
Training Data	15+ trillion tokens (up to 2024)
Default Name	meta-llama/Llama-3.2-3B-Instruct
HuggingFace	meta-llama/Llama-3.2-3B-Instruct

Training Options#

LoRA: 1x 80GB GPU, tensor parallel size 1
Full SFT: 4x 80GB GPU, tensor parallel size 2

Deployment Configuration#

LoRA:
- NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
- GPU Count: 1x 80GB
Full SFT:
- NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
- GPU Count: 1x 80GB
- Additional Environment Variables:
  - NIM_MODEL_PROFILE: vllm

Llama-3.2-1B Instruct#

Property	Value
Creator	Meta
Architecture	transformer
Description	Llama-3.2-1B is a lightweight language model designed for efficient deployment while maintaining strong capabilities.
Max I/O Tokens	8192
Parameters	1 billion
Training Data	15+ trillion tokens (up to 2024)
Default Name	meta-llama/Llama-3.2-1B-Instruct
HuggingFace	meta-llama/Llama-3.2-1B-Instruct

Training Options#

LoRA: 1x 80GB GPU, tensor parallel size 1
Full SFT: 1x 80GB GPU, tensor parallel size 1

Deployment Configuration#

LoRA:
- NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
- GPU Count: 1x 80GB
Full SFT:
- NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
- GPU Count: 1x 80GB
- Additional Environment Variables:
  - NIM_MODEL_PROFILE: vllm

Llama-3.1-8B Instruct#

Property	Value
Creator	Meta
Architecture	transformer
Description	Llama-3.1-8B is a large language AI model optimized for multilingual dialogue uses.
Max I/O Tokens	8192
Parameters	8 billion
Training Data	15 trillion tokens (up to December 2023)
Default Name	meta-llama/Llama-3.1-8B-Instruct
HuggingFace	meta-llama/Llama-3.1-8B-Instruct

Training Options#

LoRA: 1x 80GB GPU, tensor parallel size 1
Full SFT: 8x 80GB GPU, tensor parallel size 4

Deployment Configuration#

LoRA:
- NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
- GPU Count: 1x 80GB
Full SFT:
- NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
- GPU Count: 8x 80GB
- Additional Environment Variables:
  - NIM_MODEL_PROFILE: vllm