NVIDIA NIM for Large Language Models#
NVIDIA NIM for LLMs
- Introduction
- Release Notes
- Getting Started
- Deployment Guide
- Air Gap Deployment
- Multi-Node Deployment
- Deploying with Helm
- Tutorials
- Configuring a NIM
- Model Profiles
- Overview
- Benchmarking
- Models
- Supported Models
- GPUs
- Optimized Models
- Code Llama 13B Instruct
- Code Llama 34B Instruct
- Code Llama 70B Instruct
- DeepSeek R1
- DeepSeek R1 Distill Llama 8B
- DeepSeek R1 Distill Llama 70B
- DeepSeek R1 Distill Llama 8B RTX
- DeepSeek-R1-Distill-Qwen-32B
- DeepSeek-R1-Distill-Qwen-7B
- DeepSeek-R1-Distill-Qwen-14B
- Qwen2.5 72B Instruct
- Qwen2.5 7B Instruct
- Gemma 2 2B
- Gemma 2 9B
- (Meta) Llama 2 7B Chat
- (Meta) Llama 2 13B Chat
- (Meta) Llama 2 70B Chat
- Llama 3 SQLCoder 8B
- Llama 3 Swallow 70B Instruct V0.1
- Llama 3 Taiwan 70B Instruct
- Llama 3.1 8B Base
- Llama 3.1 8B Instruct
- Llama 3.1 8B Instruct RTX
- Llama 3.1 Nemotron Nano 8B V1
- Llama 3.2 1B Instruct
- Llama 3.2 3B Instruct
- Llama 3.1 70B Instruct
- Llama 3.1 405B Instruct
- Llama 3.1 Nemotron 70B Instruct
- Llama 3.1 Swallow 8B Instruct v0.1
- Llama 3.1 Swallow 70B Instruct v0.1
- Llama 3.3 70B Instruct
- Meta Llama 3 8B Instruct
- Llama 3.3 Nemotron Super 49B V1
- Meta Llama 3 70B Instruct
- Mistral 7B Instruct V0.3
- Mistral NeMo Minitron 8B 8K Instruct
- Mistral NeMo 12B Instruct RTX
- Mistral NeMo 12B Instruct
- Mixtral 8x7B Instruct V0.1
- Mixtral 8x22B Instruct V0.1
- StarCoder2 7B
- Nemotron 4 340B Instruct
- Nemotron 4 340B Instruct 128K
- Nemotron 4 340B Reward
- Phi 3 Mini 4K Instruct
- Phind Codellama 34B V2 Instruct
- StarCoderBase 15.5B
- Examples with system role
- API Reference
- Function Calling
- Using Reward Models
- Using Reasoning Models
- Llama Stack API (Experimental)
- Utilities
- Fine-tuned model support
- Observability
- Structured Generation
- Custom Guided Decoding Backend (Experimental)
- Parameter-Efficient Fine-Tuning
- KV Cache Reuse (a.k.a. prefix caching)
- Acknowledgements
- Eula