Deployment Guides#
Deployment guides, fine-tuning recipes, and agentic usage examples for Nemotron models. Each card links to its directory in the Nemotron GitHub repository.
Notebooks for deploying the 120B/12B-active hybrid Mamba-Transformer MoE model with vLLM, SGLang, and TensorRT-LLM.
Supervised fine-tuning with LoRA for Text2SQL using the BIRD SQL benchmark. Includes recipes for both NeMo AutoModel and Megatron Bridge.
Deploy on a single DGX Spark with 128 GB unified memory using vLLM (nightly) and TensorRT-LLM, including NVFP4 quantization and MTP speculative decoding.
550B total / 55B active parameter base model checkpoint announced at GTC 2026. A starting point for custom fine-tuning and RL post-training pipelines — not yet instruction-tuned.
Full-weight RL training from a base model using the GRPO/DAPO algorithm to reproduce emergent math reasoning. Requires 5× GB200 or 3× B200 nodes.
Use Nemotron 3 Super with OpenCode, OpenClaw, Kilo Code CLI, and OpenHands via OpenRouter and build.nvidia.com.
Notebooks for the 12B multimodal model that unifies visual and textual understanding. Covers NIM inference via build.nvidia.com and local Hugging Face deployment.
Notebook for the document-parsing VLM that converts PDFs and unstructured documents into structured JSON, LaTeX, and Markdown. Available via NIM at build.nvidia.com.