Llama Nemotron Models
This page provides detailed technical specifications for the Nemotron model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.
Llama 3.1 Nemotron Nano 8B v1
Training Options
- LoRA: 1x 80GB GPU, tensor parallel size 1, pipeline parallel size 1
- Full SFT: 4x 80GB GPU, tensor parallel size 2, pipeline parallel size 1
Deployment Configuration
- LoRA:
- NIM Image:
nvcr.io/nim/nvidia/llm-nim:1.15.5 - GPU Count: 1x 80GB
- Full SFT:
- NIM Image:
nvcr.io/nim/nvidia/llm-nim:1.15.5 - GPU Count: 1x 80GB
- Additional Environment Variables:
NIM_MODEL_PROFILE:vllm
NVIDIA Nemotron Nano 9B v2
Training Options
- LoRA: 4x 80GB GPU, tensor parallel size 1, pipeline parallel size 1
- Full SFT: 4x 80GB GPU, tensor parallel size 2, pipeline parallel size 1
Deployment Configuration
- LoRA:
- NIM Image:
nvcr.io/nim/nvidia/llm-nim:1.15.5 - GPU Count: 1x 80GB
- Full SFT:
- NIM Image:
nvcr.io/nim/nvidia/llm-nim:1.15.5 - GPU Count: 1x 80GB
- Additional Environment Variables:
NIM_MODEL_PROFILE:vllm
NVIDIA Nemotron 3 Nano 30B A3B
Training Options
- LoRA: 2x 80GB GPU, tensor parallel size 1, expert parallel size 2, pipeline parallel size 1
- Full SFT: 8x 80GB GPU, tensor parallel size 1, expert parallel size 8, pipeline parallel size 1
MoE Parallelism Constraints
MoE models only support expert parallelism for distributing experts across GPUs. When expert_parallel_size > 1, tensor_parallel_size must be set to 1. Additionally, expert_parallel_size must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference.
Deployment Configuration
- Full SFT:
- NIM Image:
nvcr.io/nim/nvidia/nemotron-3-nano:1.7.0-variant - GPU Count: 2x 80GB
Deployment for LoRA using NIM is not supported for this model.
NVIDIA Nemotron 3 Super 120B A12B
Training Options
- LoRA: 8x 80GB GPU, tensor parallel size 1, expert parallel size 8, pipeline parallel size 1
MoE Parallelism Constraints
MoE models only support expert parallelism for distributing experts across GPUs. When expert_parallel_size > 1, tensor_parallel_size must be set to 1. Additionally, expert_parallel_size must evenly divide the number of GPUs. These constraints apply to training parallelism only and NIM deployment may use different GPU counts optimized for inference.
Deployment Configuration
- LoRA:
- NIM Image:
nvcr.io/nim/nvidia/nemotron-3-super-120b-a12b:1.8.1-variant - GPU Count: 8x 80GB
- Additional Environment Variables:
NIM_WORKSPACE:/model-storeNIM_PIPELINE_PARALLEL_SIZE:8NIM_MAX_MODEL_LEN:4096