Llama Nemotron Models | NVIDIA NeMo Platform

This page provides detailed technical specifications for the Nemotron model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.

Llama 3.1 Nemotron Nano 8B v1

Property	Value
Creator	NVIDIA
Architecture	transformer
Description	Llama 3.1 Nemotron Nano 8B v1 is a compact, instruction-tuned model for efficient customization and deployment.
Max I/O Tokens	4096
Parameters	8 billion
Training Data	Not specified
Default Name	nvidia/Llama-3.1-Nemotron-Nano-8B-v1
Hugging Face	nvidia/Llama-3.1-Nemotron-Nano-8B-v1
NIM	nvidia/llama-3.1-nemotron-nano-8b-v1

Training Options

LoRA: 1x 80GB GPU, tensor parallel size 1, pipeline parallel size 1
Full SFT: 4x 80GB GPU, tensor parallel size 2, pipeline parallel size 1

Deployment Configuration

LoRA:
NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
GPU Count: 1x 80GB
Full SFT:
NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
GPU Count: 1x 80GB
Additional Environment Variables:
NIM_MODEL_PROFILE: vllm

NVIDIA Nemotron Nano 9B v2

Property	Value
Creator	NVIDIA
Architecture	transformer
Description	NVIDIA Nemotron Nano 9B v2 is a compact, instruction-tuned model optimized for efficient customization and deployment.
Max I/O Tokens	4096
Parameters	9 billion
Default Name	nvidia/NVIDIA-Nemotron-Nano-9B-v2
Hugging Face	nvidia/NVIDIA-Nemotron-Nano-9B-v2
NIM	NVIDIA-Nemotron-Nano-9B-v2

Training Options

LoRA: 4x 80GB GPU, tensor parallel size 1, pipeline parallel size 1
Full SFT: 4x 80GB GPU, tensor parallel size 2, pipeline parallel size 1

Deployment Configuration

LoRA:
NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
GPU Count: 1x 80GB
Full SFT:
NIM Image: nvcr.io/nim/nvidia/llm-nim:1.15.5
GPU Count: 1x 80GB
Additional Environment Variables:
NIM_MODEL_PROFILE: vllm

NVIDIA Nemotron 3 Nano 30B A3B

Property	Value
Creator	NVIDIA
Architecture	Hybrid Mixture of Experts (MoE) - Mamba-2 + Transformer
Description	Nemotron-3-Nano-30B-A3B-BF16 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. Uses configurable reasoning via chat template.
Max I/O Tokens	2048
Parameters	30B total (3.5B active)
MoE Configuration	128 experts + 1 shared expert, 6 experts activated per token
Supported Languages	English, German, Spanish, French, Italian, Japanese
Default Name	nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
Hugging Face	nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
NIM	Nemotron-3-Nano-30B-A3B

Training Options

LoRA: 2x 80GB GPU, tensor parallel size 1, expert parallel size 2, pipeline parallel size 1
Full SFT: 8x 80GB GPU, tensor parallel size 1, expert parallel size 8, pipeline parallel size 1

MoE Parallelism Constraints

MoE models only support expert parallelism for distributing experts across GPUs. When expert_parallel_size > 1, tensor_parallel_size must be set to 1. Additionally, expert_parallel_size must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference.

Deployment Configuration

Full SFT:
NIM Image: nvcr.io/nim/nvidia/nemotron-3-nano:1.7.0-variant
GPU Count: 2x 80GB

Deployment for LoRA using NIM is not supported for this model.

NVIDIA Nemotron 3 Super 120B A12B

Property	Value
Creator	NVIDIA
Architecture	Mixture of Experts (MoE)
Description	Nemotron-3-Super-120B-A12B-BF16 is a large MoE language model from NVIDIA designed for high-capacity reasoning and instruction-following tasks.
Max I/O Tokens	4096
Parameters	120B total (12B active)
Default Name	nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
Hugging Face	nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Training Options

LoRA: 8x 80GB GPU, tensor parallel size 1, expert parallel size 8, pipeline parallel size 1

MoE Parallelism Constraints

Deployment Configuration

LoRA:
NIM Image: nvcr.io/nim/nvidia/nemotron-3-super-120b-a12b:1.8.1-variant
GPU Count: 8x 80GB
Additional Environment Variables:
NIM_WORKSPACE: /model-store
NIM_PIPELINE_PARALLEL_SIZE: 8
NIM_MAX_MODEL_LEN: 4096