Nemotron-3-Super-120B-A12B#

Nemotron-3-Super-120B-A12B is the default LLM for the NVIDIA RAG Blueprint. It is trained by NVIDIA and designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for collaborative agents and high-volume workloads such as IT ticket automation.

We recommend using the model with low-effort reasoning mode with a reasoning budget of 256 to balance accuracy and performance. You can switch to non-reasoning mode for maximum performance or use reasoning mode for best accuracy.

Hardware requirements#

For Docker and Kubernetes deployment, see the following:

For self-hosted local NIM deployment with nemotron-3-super-120b-a12b, you need one of the following:

  • 3 x H100

  • 3 x B200

  • 3 x RTX PRO 6000

For Helm deployment, you need one of the following:

  • 9 x H100-80GB

  • 9 x B200

  • 9 x RTX PRO 6000


RTX PRO 6000 Setup#

Note: These steps are only required for RTX PRO 6000 Blackwell Server Edition using the TP2 profile. Skip if you are using a TP4 or TP8 profile.

  1. Edit /etc/default/grub and set:

    GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iommu=pt"
    
  2. Run:

    sudo update-grub2
    sudo reboot
    

No additional configuration changes are needed in nims.yaml or values.yaml beyond the defaults.


Reasoning and non-reasoning mode#

To disable reasoning mode:

export LLM_ENABLE_THINKING=false
export LLM_REASONING_BUDGET=0

For other options (e.g. full reasoning budget), see Enable reasoning for Nemotron 3 models.