Deployment Guides#

Deployment guides, fine-tuning recipes, and agentic usage examples for Nemotron models. Each card links to its directory in the Nemotron GitHub repository.

Nemotron 3 Ultra

Cookbooks for the 550B/55B-active hybrid Mamba-Transformer MoE model, including vLLM, SGLang, TensorRT-LLM, LoRA Text2SQL, RL, and agent harness setup.

Notebook Local GPU Fine-tuning Jun 4, 2026

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-3-Ultra

Nemotron 3 Ultra on DGX Spark

Deploy across a 4x DGX Spark cluster with vLLM, including tensor parallelism, RoCE networking, MTP speculative decoding, and NVIDIA AIPerf benchmarking.

Local GPU Jun 22, 2026

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-3-Ultra/SparkDeploymentGuide

Nemotron 3 Ultra on DGX Station

Deploy on a single GB300-based DGX Station with vLLM, using coherent CPU memory for selective MoE expert offloading and a FlashInfer TensorRT-LLM NVFP4 backend.

Local GPU Jul 6, 2026

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-3-Ultra/StationDeploymentGuide

Nemotron 3 Ultra on Agentic Coding

Use Nemotron 3 Ultra with OpenCode, OpenClaw, Kilo Code CLI, OpenHands, Hermes Agent, and Pi via config-based setup.

Beginner Jun 4, 2026

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-3-Ultra/OpenScaffoldingResources

Nemotron 3 Super

Notebooks for deploying the 120B/12B-active hybrid Mamba-Transformer MoE model with vLLM, SGLang, and TensorRT-LLM.

Notebook Local GPU Apr 28, 2026

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-3-Super

Nemotron 3 Super — LoRA Text2SQL

Supervised fine-tuning with LoRA for Text2SQL using the BIRD SQL benchmark. Includes recipes for both NeMo AutoModel and Megatron Bridge.

Local GPU Fine-tuning Apr 28, 2026

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-3-Super/lora-text2sql

Nemotron 3 Super on DGX Spark

Deploy on a single DGX Spark with 128 GB unified memory using vLLM (nightly) and TensorRT-LLM, including NVFP4 quantization and MTP speculative decoding.

Local GPU Apr 10, 2026

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-3-Super/SparkDeploymentGuide

Nemotron 3 Ultra Base

550B total / 55B active parameter base model checkpoint announced at GTC 2026. A starting point for custom fine-tuning and RL post-training pipelines — not yet instruction-tuned.

Local GPU Fine-tuning Mar 23, 2026

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-3-Ultra-Base

Nemotron 3 Super on GRPO/DAPO RL Training

Full-weight RL training from a base model using the GRPO/DAPO algorithm to reproduce emergent math reasoning. Requires 5× GB200 or 3× B200 nodes.

Local GPU Fine-tuning Mar 11, 2026

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-3-Super/grpo-dapo

Nemotron 3 Super on Agentic Coding

Use Nemotron 3 Super with OpenCode, OpenClaw, Kilo Code CLI, and OpenHands via OpenRouter and build.nvidia.com.

Beginner Mar 11, 2026

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-3-Super/OpenScaffoldingResources

Nemotron Nano 2 VL

Notebooks for the 12B multimodal model that unifies visual and textual understanding. Covers NIM inference via build.nvidia.com and local Hugging Face deployment.

Notebook Local GPU Oct 28, 2025

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-Nano2-VL

Nemotron Parse v1.1

Notebook for the document-parsing VLM that converts PDFs and unstructured documents into structured JSON, LaTeX, and Markdown. Available via NIM at build.nvidia.com.

Beginner Notebook Oct 28, 2025

https://github.com/NVIDIA-NeMo/nemotron/tree/main/usage-cookbook/Nemotron-Parse-v1.1