NVIDIA NIM for Large Language Models#
NVIDIA NIM for LLMs
- Introduction
 - Release Notes
 - Getting Started
 - Deployment Guide
 - Air Gap Deployment
 - Multi-Node Deployment
 - Deploying with Helm
 - Tutorials
 - Configuring a NIM
 - Model Profiles
 - Overview
 - Benchmarking
 - Models
 - Supported Models
- GPUs
 - Optimized Models
- Bielik 11B v2.3 Instruct
 - Code Llama 13B Instruct
 - Code Llama 34B Instruct
 - Code Llama 70B Instruct
 - DeepSeek R1
 - DeepSeek R1 Distill Llama 8B
 - DeepSeek R1 Distill Llama 70B
 - DeepSeek R1 Distill Llama 8B RTX
 - DeepSeek-R1-Distill-Qwen-32B
 - DeepSeek-R1-Distill-Qwen-7B
 - DeepSeek-R1-Distill-Qwen-14B
 - EuroLLM 9B Instruct
 - Qwen2.5 Coder 32B Instruct
 - Qwen2.5 72B Instruct
 - Qwen2.5 7B Instruct
 - Gemma 2 2B
 - Gemma 2 9B
 - Gemma2 9B CPT Sahabat-AI v1 Instruct
 - Granite 3.3 8B Instruct
 - (Meta) Llama 2 7B Chat
 - (Meta) Llama 2 13B Chat
 - (Meta) Llama 2 70B Chat
 - Llama 3 SQLCoder 8B
 - Llama 3 Swallow 70B Instruct V0.1
 - Llama 3 Taiwan 70B Instruct
 - Llama 3.1 8B Base
 - Llama 3.1 8B Instruct
 - Llama 3.1 8B Instruct RTX
 - Llama 3.1 Nemotron Nano 4B V1.1
 - Llama 3.1 Nemotron Nano 8B V1
 - Llama 3.1 Nemotron Ultra 253B V1
 - Llama 3.2 1B Instruct
 - Llama 3.2 3B Instruct
 - Llama 3.1 70B Instruct
 - Llama 3.1 405B Instruct
 - Llama 3.1 Nemotron 70B Instruct
 - Llama 3.1 Swallow 8B Instruct v0.1
 - Llama 3.1 Swallow 70B Instruct v0.1
 - Llama 3.1 Typhoon 2 8B Instruct
 - Llama 3.3 70B Instruct
 - Meta Llama 3 8B Instruct
 - Llama 3.3 Nemotron Super 49B V1
 - Meta Llama 3 70B Instruct
 - Mistral 7B Instruct V0.3
 - Mistral NeMo Minitron 8B 8K Instruct
 - Mistral NeMo 12B Instruct RTX
 - Mistral NeMo 12B Instruct
 - Mistral Small 24b Instruct 2501
 - Mixtral 8x7B Instruct V0.1
 - Mixtral 8x22B Instruct V0.1
 - Nemotron 4 340B Instruct
 - Nemotron 4 340B Reward
 - Phi 3 Mini 4K Instruct
 - Phind Codellama 34B V2 Instruct
 - Riva Translate 4B Instruct
 - Sarvam - M
 - Supported TRT-LLM buildable profiles
 - SILMA 9B Instruct v1.0
 - StarCoder2 7B
 - StarCoderBase 15.5B
 
 
 - Examples with system role
 - API Reference
 - Function Calling
 - Using Reward Models
 - Using Reasoning Models
 - Llama Stack API (Experimental)
 - Utilities
 - Fine-tuned model support
 - Observability
 - Structured Generation
 - Custom Guided Decoding Backend (Experimental)
 - Parameter-Efficient Fine-Tuning
 - KV Cache Reuse (a.k.a. prefix caching)
 - Acknowledgements
 - Eula