> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo-platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo-platform/_mcp/server.

# Llama Nemotron Models

<a id="model-catalog-llama-nemotron" />

This page provides detailed technical specifications for the Nemotron model family supported by NeMo Customizer. For information about supported features and capabilities, refer to [Tested Models](/documentation/customizer-reference/models/model-catalog).

## Llama 3.1 Nemotron Nano 8B v1

| Property       | Value                                                                                                                                               |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| Creator        | NVIDIA                                                                                                                                              |
| Architecture   | transformer                                                                                                                                         |
| Description    | Llama 3.1 Nemotron Nano 8B v1 is a compact, instruction-tuned model for efficient customization and deployment.                                     |
| Max I/O Tokens | 4096                                                                                                                                                |
| Parameters     | 8 billion                                                                                                                                           |
| Training Data  | Not specified                                                                                                                                       |
| Default Name   | nvidia/Llama-3.1-Nemotron-Nano-8B-v1                                                                                                                |
| HuggingFace    | [nvidia/Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)                                                 |
| NIM            | [nvidia/llama-3.1-nemotron-nano-8b-v1](https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/llama-3.1-nemotron-nano-8b-v1?version=1.8.4) |

### Training Options

* **LoRA**: 1x 80GB GPU, tensor parallel size 1, pipeline parallel size 1
* **Full SFT**: 4x 80GB GPU, tensor parallel size 2, pipeline parallel size 1

### Deployment Configuration

* **LoRA**:
* NIM Image: `nvcr.io/nim/nvidia/llm-nim:1.15.5`
* GPU Count: 1x 80GB
* **Full SFT**:
* NIM Image: `nvcr.io/nim/nvidia/llm-nim:1.15.5`
* GPU Count: 1x 80GB
* Additional Environment Variables:
* `NIM_MODEL_PROFILE`: `vllm`

## NVIDIA Nemotron Nano 9B v2

| Property       | Value                                                                                                                                   |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| Creator        | NVIDIA                                                                                                                                  |
| Architecture   | transformer                                                                                                                             |
| Description    | NVIDIA Nemotron Nano 9B v2 is a compact, instruction-tuned model optimized for efficient customization and deployment.                  |
| Max I/O Tokens | 4096                                                                                                                                    |
| Parameters     | 9 billion                                                                                                                               |
| Default Name   | nvidia/NVIDIA-Nemotron-Nano-9B-v2                                                                                                       |
| HuggingFace    | [nvidia/NVIDIA-Nemotron-Nano-9B-v2](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2)                                           |
| NIM            | [NVIDIA-Nemotron-Nano-9B-v2](https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/nvidia-nemotron-nano-9b-v2?version=latest) |

### Training Options

* **LoRA**: 4x 80GB GPU, tensor parallel size 1, pipeline parallel size 1
* **Full SFT**: 4x 80GB GPU, tensor parallel size 2, pipeline parallel size 1

### Deployment Configuration

* **LoRA**:
* NIM Image: `nvcr.io/nim/nvidia/llm-nim:1.15.5`
* GPU Count: 1x 80GB
* **Full SFT**:
* NIM Image: `nvcr.io/nim/nvidia/llm-nim:1.15.5`
* GPU Count: 1x 80GB
* Additional Environment Variables:
* `NIM_MODEL_PROFILE`: `vllm`

## NVIDIA Nemotron 3 Nano 30B A3B

| Property            | Value                                                                                                                                                                                                                   |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Creator             | NVIDIA                                                                                                                                                                                                                  |
| Architecture        | Hybrid Mixture of Experts (MoE) - Mamba-2 + Transformer                                                                                                                                                                 |
| Description         | Nemotron-3-Nano-30B-A3B-BF16 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. Uses configurable reasoning via chat template. |
| Max I/O Tokens      | 2048                                                                                                                                                                                                                    |
| Parameters          | 30B total (3.5B active)                                                                                                                                                                                                 |
| MoE Configuration   | 128 experts + 1 shared expert, 6 experts activated per token                                                                                                                                                            |
| Supported Languages | English, German, Spanish, French, Italian, Japanese                                                                                                                                                                     |
| Default Name        | nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16                                                                                                                                                                              |
| HuggingFace         | [nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16)                                                                                                         |
| NIM                 | [Nemotron-3-Nano-30B-A3B](https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/nemotron-3-nano?version=2.0.1)                                                                                                |

### Training Options

* **LoRA**: 2x 80GB GPU, tensor parallel size 1, expert parallel size 2, pipeline parallel size 1
* **Full SFT**: 8x 80GB GPU, tensor parallel size 1, expert parallel size 8, pipeline parallel size 1

**MoE Parallelism Constraints**

MoE models only support expert parallelism for distributing experts across GPUs. When `expert_parallel_size > 1`, `tensor_parallel_size` must be set to 1. Additionally, `expert_parallel_size` must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference.

### Deployment Configuration

* **Full SFT**:
* NIM Image: `nvcr.io/nim/nvidia/nemotron-3-nano:1.7.0-variant`
* GPU Count: 2x 80GB

Deployment for LoRA using NIM is not supported for this model.

## NVIDIA Nemotron 3 Super 120B A12B

| Property       | Value                                                                                                                                           |
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| Creator        | NVIDIA                                                                                                                                          |
| Architecture   | Mixture of Experts (MoE)                                                                                                                        |
| Description    | Nemotron-3-Super-120B-A12B-BF16 is a large MoE language model from NVIDIA designed for high-capacity reasoning and instruction-following tasks. |
| Max I/O Tokens | 4096                                                                                                                                            |
| Parameters     | 120B total (12B active)                                                                                                                         |
| Default Name   | nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16                                                                                                   |
| HuggingFace    | [nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16)                           |

### Training Options

* **LoRA**: 8x 80GB GPU, tensor parallel size 1, expert parallel size 8, pipeline parallel size 1

**MoE Parallelism Constraints**

MoE models only support expert parallelism for distributing experts across GPUs. When `expert_parallel_size > 1`, `tensor_parallel_size` must be set to 1. Additionally, `expert_parallel_size` must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference.

### Deployment Configuration

* **LoRA**:
* NIM Image: `nvcr.io/nim/nvidia/nemotron-3-super-120b-a12b:1.8.1-variant`
* GPU Count: 8x 80GB
* Additional Environment Variables:
* `NIM_WORKSPACE`: `/model-store`
* `NIM_PIPELINE_PARALLEL_SIZE`: `8`
* `NIM_MAX_MODEL_LEN`: `4096`