NVIDIA NIM for Large Language Models#
NVIDIA NIM for LLMs
- Introduction
 - Release Notes
 - Getting Started
 - Deployment Guide
 - Air Gap Deployment
 - Multi-Node Deployment
 - Deploying with Helm
 - Tutorials
 - Configuring a NIM
 - Model Profiles
 - Overview
 - Benchmarking
 - Models
 - Supported Models
- GPUs
 - Optimized Models
- Code Llama 13B Instruct
 - Code Llama 34B Instruct
 - Code Llama 70B Instruct
 - DeepSeek R1
 - Gemma 2 2B
 - Gemma 2 9B
 - (Meta) Llama 2 7B Chat
 - (Meta) Llama 2 13B Chat
 - (Meta) Llama 2 70B Chat
 - Llama 3 SQLCoder 8B
 - Llama 3 Swallow 70B Instruct V0.1
 - Llama 3 Taiwan 70B Instruct
 - Llama 3.1 8B Base
 - Llama 3.1 8B Instruct
 - Llama 3.1 70B Instruct
 - Llama 3.1 405B Instruct
 - Llama 3.1 Nemotron 70B Instruct
 - Llama 3.1 Swallow 8B Instruct v0.1
 - Llama 3.1 Swallow 70B Instruct v0.1
 - Llama 3.3 70B Instruct
 - Meta Llama 3 8B Instruct
 - Meta Llama 3 70B Instruct
 - Mistral 7B Instruct V0.3
 - Mistral NeMo 12B Instruct RTX
 - Mistral NeMo Minitron 8B 8K Instruct
 - Mistral NeMo 12B Instruct
 - Mixtral 8x7B Instruct V0.1
 - Mixtral 8x22B Instruct V0.1
 - Nemotron 4 340B Instruct
 - Nemotron 4 340B Instruct 128K
 - Nemotron 4 340B Reward
 - Phi 3 Mini 4K Instruct
 - Phind Codellama 34B V2 Instruct
 - StarCoderBase 15.5B
 
 
 - Examples with system role
 - API Reference
 - Function Calling
 - Using Reward Models
 - Llama Stack API (Experimental)
 - Utilities
 - Fine-tuned model support
 - Observability
 - Structured Generation
 - Custom Guided Decoding Backend (Experimental)
 - Parameter-Efficient Fine-Tuning
 - KV Cache Reuse (a.k.a. prefix caching)
 - Acknowledgements
 - Eula