Skip to main content
country_code
Ctrl+K
NVIDIA NIM for Large Language Models - Home NVIDIA NIM for Large Language Models - Home

NVIDIA NIM for Large Language Models

  • Documentation Home
NVIDIA NIM for Large Language Models - Home NVIDIA NIM for Large Language Models - Home

NVIDIA NIM for Large Language Models

  • Documentation Home

Table of Contents

About NVIDIA NIM for LLMs

  • Overview
  • Enterprise-Grade Inference Software Stack
  • Release Notes

Get Started

  • About Get Started
  • Prerequisites
  • Installation
  • Configuration
  • Quickstart

Deployment

  • Model Profiles and Selection
  • Model Download
  • Model-Free NIM
  • Kubernetes Deployment
    • Helm and Kubernetes
    • KServe
    • OpenShift
    • Run:ai
    • NIM Operator Deployment
  • Cloud Service Provider (CSP) Deployment
    • Google Cloud
    • AWS
    • Azure
    • Oracle
  • Air-Gap Deployment
  • Multi-Node Deployment
  • vGPU Deployment

Advanced Use Cases

  • Fine-Tuning with LoRA
  • Custom Logits Processing
  • Prompt Embeddings

Reference

  • Architecture
  • Environment Variables
  • API Reference
  • CLI Reference
  • Advanced Configuration
  • Logging and Observability
  • 1.x Migration Guide
  • Support Matrix
  • Archived Versions

Troubleshooting

  • GPU Memory (OOM) Errors

Resources

  • Support and FAQ
  • Related Products
  • Legal
  • Legal
Is this page helpful?

Legal#

This page contains the primary legal references for NVIDIA NIM for Large Language Models.

NVIDIA AI Product Agreement#

By using this NIM, you acknowledge that you have read and agreed to the NVIDIA AI Product Agreement.

Open Source Software License Acknowledgements#

NVIDIA NIM for Large Language Models (NIM LLM) is built on the work of open source communities whose ongoing innovation drives the LLM inference ecosystem forward. NVIDIA is grateful for the collaboration, contributions, and shared commitment to advancing AI infrastructure that these projects represent.

vLLM#

vLLM is the inference engine at the core of NIM LLM. The project provides a high-throughput, memory-efficient serving engine for large language models, and NIM LLM packages vLLM directly as its inference backend.

NVIDIA recognizes the vLLM community for:

  • Foundational inference technology. vLLM introduced PagedAttention, a novel memory management algorithm for KV cache, and has subsequently implemented cutting-edge theory such as continuous batching, automatic prefix caching, and chunked prefill into a production-grade, open source engine. These capabilities form the backbone of every NIM LLM deployment.

  • Sustained collaboration. NVIDIA engineers contribute upstream to vLLM, and NVIDIA is dedicated to continuing this investment. Improvements flow in both directions - NVIDIA contributes optimizations to the vLLM-project and community-driven innovations are available to NIM LLM users.

  • An open development model. The vLLM project maintains a transparent development process that welcomes contributors across organizations. This openness accelerates the pace of innovation for the entire ecosystem.

For vLLM documentation, refer to the vLLM project documentation. For details on how NIM LLM integrates with vLLM, refer to Architecture.

Open Source Projects#

NVIDIA NIM for Large Language Models includes open source software components. For the acknowledgements that apply to a specific container, refer to the NVIDIA OSS archive.

previous

Related Software

On this page
  • NVIDIA AI Product Agreement
  • Open Source Software License Acknowledgements
    • vLLM
    • Open Source Projects
NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2024-2026, NVIDIA Corporation.

Last updated on Mar 25, 2026.