Is this page helpful?

Llama Nemotron Models#

This page provides detailed technical specifications for the Nemotron model family supported by NeMo Customizer. For information about supported features and capabilities, refer to Tested Models.

Llama 3.1 Nemotron Nano 8B v1#

Property	Value
Creator	NVIDIA
Architecture	transformer
Description	Llama 3.1 Nemotron Nano 8B v1 is a compact, instruction-tuned model for efficient customization and deployment.
Max I/O Tokens	4096
Parameters	8 billion
Training Data	Not specified
Recommended GPUs for Customization	1 (LoRA), 8 (All Weights)
Default Name	nvidia/nemotron-nano-llama-3.1-8b@1.0
Version	`ngc://nvidia/nemo/nemotron-nano-3_1-8b:0.0.1`
NIM	nvidia/llama-3.1-nemotron-nano-8b-v1

Training Options#

LoRA: 1 GPU, tensor parallel size 1
All Weights: 8 GPUs, tensor parallel size 4

Llama 3.3 Nemotron Super 49B v1#

Property	Value
Creator	NVIDIA
Architecture	transformer
Description	Llama 3.3 Nemotron Super 49B v1 is a large, instruction-tuned model for advanced dialogue and reasoning tasks.
Max I/O Tokens	4096
Parameters	49 billion
Training Data	Not specified
Recommended GPUs for Customization	4 (LoRA)
Default Name	nvidia/nemotron-super-llama-3.3-49b@1.0
Version	`ngc://nvidia/nemo/nemotron-super-3_3-49b:v1`
NIM	nvidia/llama-3.3-nemotron-super-49b-v1 v1.8.6

Training Options#

LoRA: 4 GPUs, tensor parallel size 4

Llama 3.3 Nemotron Super 49B v1.5#

Property	Value
Creator	NVIDIA
Architecture	transformer
Description	Llama 3.3 Nemotron Super 49B v1.5 is a large, instruction-tuned model for advanced dialogue and reasoning tasks.
Max I/O Tokens	4096
Parameters	49 billion
Training Data	Not specified
Recommended GPUs for Customization	4 (LoRA)
Default Name	nvidia/nemotron-super-llama-3.3-49b@1.5
Version	`hf://nvidia/Llama-3_3-Nemotron-Super-49B-v1_5`
NIM	nvidia/llama-3.3-nemotron-super-49b-v1.5 v1.14

Training Options#

LoRA: 4 GPUs, tensor parallel size 4

NVIDIA Nemotron 3 Nano 30B A3B#

Property	Value
Creator	NVIDIA
Architecture	Hybrid Mixture of Experts (MoE) - Mamba-2 + Transformer
Description	Nemotron-3-Nano-30B-A3B-BF16 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. Uses configurable reasoning via chat template.
Max I/O Tokens	2048
Parameters	30B total (3.5B active)
MoE Configuration	128 experts + 1 shared expert, 6 experts activated per token
Supported Languages	English, German, Spanish, French, Italian, Japanese
Recommended GPUs for Customization	1 (LoRA), 8 (All Weights)
Default Name	nvidia/nemotron3-nano-30b@v3
HF Model URI	`hf://nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16`
HuggingFace	nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
NIM	Only supported with All Weights

Training Options#

LoRA: 1 GPU, tensor parallel size 1, expert model parallel size 1, pipeline parallel size 1
All Weights: 8 GPUs, tensor parallel size 1, pipeline parallel size 1, expert model parallel size 1

Known Limitations#

Warning

Full SFT (All Weights) Training Limitation

For All Weights fine-tuning, batch size is limited to:

batch_size = data_parallel_size × micro_batch_size

If data_parallel_size is not explicitly set, it is automatically calculated as:

data_parallel_size = total_gpus / (tensor_parallel_size × pipeline_parallel_size)

For example, with the default configuration (8 GPUs, tensor_parallel_size=1, pipeline_parallel_size=1, micro_batch_size=1), the calculated data_parallel_size is 8, so set batch size to 8.

Warning

NIM Deployment on A100 GPUs

When deploying the fine-tuned model using the Deployment Management Service, the NIM by default pulls a profile compatible with FP8 precision (precision=fp8). FP8 requires GPU compute capability 89+ (such as H100).

A100 GPUs have compute capability 80 and do not support FP8, resulting in the following error:

Value error, The quantization method modelopt is not supported for the current GPU.
Minimum capability: 89. Current capability: 80.

When deploying to A100 GPUs, add the following environment variable to select the BF16 precision profile:

"additional_envs": {
  "NIM_TAGS_SELECTOR": "precision=bf16"
}

Helm Configuration#

To enable this model for fine-tuning, add the following to your Helm overrides file:

customizer:
  customizationTargets:
    hfTargetDownload:
      # -- set this to true to allow model downloads from Hugging Face
      enabled: true
      # -- List of allowed organizations for model downloads from Hugging Face
      allowedHfOrgs:
        - "nvidia"
    trustedModelURIs:
      - hf://nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
    overrideExistingTargets: true
    targets:
      nvidia/nemotron3-nano-30b@v3:
        name: nemotron3-nano-30b@v3
        namespace: nvidia
        enabled: true
        model_uri: hf://nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
        hf_endpoint: https://huggingface.co
        model_path: nemotron_nano_30b
        base_model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
        num_parameters: 30000000000
        precision: bf16-mixed
  customizationConfigTemplates:
    overrideExistingTemplates: true
    templates:
      nvidia/nemotron3-nano-30b@v3.0.0+80GB:
        name: nemotron3-nano-30b@v3.0.0+80GB
        namespace: nvidia
        target: nvidia/nemotron3-nano-30b@v3
        training_options:
          - training_type: sft
            finetuning_type: lora
            num_gpus: 1
            num_nodes: 1
            tensor_parallel_size: 1
            micro_batch_size: 1
            expert_model_parallel_size: 1
            pipeline_parallel_size: 1
          - training_type: sft
            finetuning_type: all_weights
            num_gpus: 8
            num_nodes: 1
            tensor_parallel_size: 1
            micro_batch_size: 1
            expert_model_parallel_size: 1
            pipeline_parallel_size: 1
        max_seq_length: 2048
        prompt_template: "{prompt} {completion}"