Llama Nemotron Models#
This page provides detailed technical specifications for the Nemotron model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.
Llama 3.1 Nemotron Nano 8B v1#
Property |
Value |
|---|---|
Creator |
NVIDIA |
Architecture |
transformer |
Description |
Llama 3.1 Nemotron Nano 8B v1 is a compact, instruction-tuned model for efficient customization and deployment. |
Max I/O Tokens |
4096 |
Parameters |
8 billion |
Training Data |
Not specified |
Default Name |
nvidia/Llama-3.1-Nemotron-Nano-8B-v1 |
HuggingFace |
|
NIM |
Training Options#
LoRA: 1x 80GB GPU, tensor parallel size 1, pipeline parallel size 1
Full SFT: 4x 80GB GPU, tensor parallel size 2, pipeline parallel size 1
Deployment Configuration#
LoRA:
NIM Image:
nvcr.io/nim/nvidia/llm-nim:1.15.5GPU Count: 1x 80GB
Full SFT:
NIM Image:
nvcr.io/nim/nvidia/llm-nim:1.15.5GPU Count: 1x 80GB
Additional Environment Variables:
NIM_MODEL_PROFILE:vllm
NVIDIA Nemotron Nano 9B v2#
Property |
Value |
|---|---|
Creator |
NVIDIA |
Architecture |
transformer |
Description |
NVIDIA Nemotron Nano 9B v2 is a compact, instruction-tuned model optimized for efficient customization and deployment. |
Max I/O Tokens |
4096 |
Parameters |
9 billion |
Default Name |
nvidia/NVIDIA-Nemotron-Nano-9B-v2 |
HuggingFace |
|
NIM |
Training Options#
LoRA: 4x 80GB GPU, tensor parallel size 1, pipeline parallel size 1
Full SFT: 4x 80GB GPU, tensor parallel size 2, pipeline parallel size 1
Deployment Configuration#
LoRA:
NIM Image:
nvcr.io/nim/nvidia/llm-nim:1.15.5GPU Count: 1x 80GB
Full SFT:
NIM Image:
nvcr.io/nim/nvidia/llm-nim:1.15.5GPU Count: 1x 80GB
Additional Environment Variables:
NIM_MODEL_PROFILE:vllm
NVIDIA Nemotron 3 Nano 30B A3B#
Property |
Value |
|---|---|
Creator |
NVIDIA |
Architecture |
Hybrid Mixture of Experts (MoE) - Mamba-2 + Transformer |
Description |
Nemotron-3-Nano-30B-A3B-BF16 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. Uses configurable reasoning via chat template. |
Max I/O Tokens |
2048 |
Parameters |
30B total (3.5B active) |
MoE Configuration |
128 experts + 1 shared expert, 6 experts activated per token |
Supported Languages |
English, German, Spanish, French, Italian, Japanese |
Default Name |
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 |
HuggingFace |
|
NIM |
Training Options#
LoRA: 2x 80GB GPU, tensor parallel size 1, expert parallel size 2, pipeline parallel size 1
Full SFT: 8x 80GB GPU, tensor parallel size 1, expert parallel size 8, pipeline parallel size 1
Note
MoE Parallelism Constraints
MoE models only support expert parallelism for distributing experts across GPUs. When expert_parallel_size > 1, tensor_parallel_size must be set to 1. Additionally, expert_parallel_size must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference.
Deployment Configuration#
Full SFT:
NIM Image:
nvcr.io/nim/nvidia/nemotron-3-nano:1.7.0-variantGPU Count: 2x 80GB
Note
Deployment for LoRA using NIM is not supported for this model.
NVIDIA Nemotron 3 Super 120B A12B#
Property |
Value |
|---|---|
Creator |
NVIDIA |
Architecture |
Mixture of Experts (MoE) |
Description |
Nemotron-3-Super-120B-A12B-BF16 is a large MoE language model from NVIDIA designed for high-capacity reasoning and instruction-following tasks. |
Max I/O Tokens |
4096 |
Parameters |
120B total (12B active) |
Default Name |
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 |
HuggingFace |
Training Options#
LoRA: 8x 80GB GPU, tensor parallel size 1, expert parallel size 8, pipeline parallel size 1
Note
MoE Parallelism Constraints
MoE models only support expert parallelism for distributing experts across GPUs. When expert_parallel_size > 1, tensor_parallel_size must be set to 1. Additionally, expert_parallel_size must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference.
Deployment Configuration#
LoRA:
NIM Image:
nvcr.io/nim/nvidia/nemotron-3-super-120b-a12b:1.8.1-variantGPU Count: 8x 80GB
Additional Environment Variables:
NIM_WORKSPACE:/model-storeNIM_PIPELINE_PARALLEL_SIZE:8NIM_MAX_MODEL_LEN:4096