> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo-platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo-platform/_mcp/server.

# Llama Models

<a id="model-catalog-llama" />

This page provides detailed technical specifications for the Llama model family supported by NeMo Customizer. For information about supported features and capabilities, refer to [Tested Models](/documentation/customizer-reference/models/model-catalog).

## Llama-3.2-3B Instruct

| Property       | Value                                                                                             |
| -------------- | ------------------------------------------------------------------------------------------------- |
| Creator        | Meta                                                                                              |
| Architecture   | transformer                                                                                       |
| Description    | Llama-3.2-3B is a compact yet powerful language model suitable for various dialogue applications. |
| Max I/O Tokens | 8192                                                                                              |
| Parameters     | 3 billion                                                                                         |
| Training Data  | 15+ trillion tokens (up to 2024)                                                                  |
| Default Name   | meta-llama/Llama-3.2-3B-Instruct                                                                  |
| HuggingFace    | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)       |

### Training Options

* **LoRA**: 1x 80GB GPU, tensor parallel size 1
* **Full SFT**: 4x 80GB GPU, tensor parallel size 2

### Deployment Configuration

* **LoRA**:
* NIM Image: `nvcr.io/nim/nvidia/llm-nim:1.15.5`
* GPU Count: 1x 80GB
* **Full SFT**:
* NIM Image: `nvcr.io/nim/nvidia/llm-nim:1.15.5`
* GPU Count: 1x 80GB
* Additional Environment Variables:
* `NIM_MODEL_PROFILE`: `vllm`

## Llama-3.2-1B Instruct

| Property       | Value                                                                                                                 |
| -------------- | --------------------------------------------------------------------------------------------------------------------- |
| Creator        | Meta                                                                                                                  |
| Architecture   | transformer                                                                                                           |
| Description    | Llama-3.2-1B is a lightweight language model designed for efficient deployment while maintaining strong capabilities. |
| Max I/O Tokens | 8192                                                                                                                  |
| Parameters     | 1 billion                                                                                                             |
| Training Data  | 15+ trillion tokens (up to 2024)                                                                                      |
| Default Name   | meta-llama/Llama-3.2-1B-Instruct                                                                                      |
| HuggingFace    | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)                           |

### Training Options

* **LoRA**: 1x 80GB GPU, tensor parallel size 1
* **Full SFT**: 1x 80GB GPU, tensor parallel size 1

### Deployment Configuration

* **LoRA**:
* NIM Image: `nvcr.io/nim/nvidia/llm-nim:1.15.5`
* GPU Count: 1x 80GB
* **Full SFT**:
* NIM Image: `nvcr.io/nim/nvidia/llm-nim:1.15.5`
* GPU Count: 1x 80GB
* Additional Environment Variables:
* `NIM_MODEL_PROFILE`: `vllm`

## Llama-3.1-8B Instruct

| Property       | Value                                                                                       |
| -------------- | ------------------------------------------------------------------------------------------- |
| Creator        | Meta                                                                                        |
| Architecture   | transformer                                                                                 |
| Description    | Llama-3.1-8B is a large language AI model optimized for multilingual dialogue uses.         |
| Max I/O Tokens | 8192                                                                                        |
| Parameters     | 8 billion                                                                                   |
| Training Data  | 15 trillion tokens (up to December 2023)                                                    |
| Default Name   | meta-llama/Llama-3.1-8B-Instruct                                                            |
| HuggingFace    | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |

### Training Options

* **LoRA**: 1x 80GB GPU, tensor parallel size 1
* **Full SFT**: 8x 80GB GPU, tensor parallel size 4

### Deployment Configuration

* **LoRA**:
* NIM Image: `nvcr.io/nim/nvidia/llm-nim:1.15.5`
* GPU Count: 1x 80GB
* **Full SFT**:
* NIM Image: `nvcr.io/nim/nvidia/llm-nim:1.15.5`
* GPU Count: 8x 80GB
* Additional Environment Variables:
* `NIM_MODEL_PROFILE`: `vllm`