Skip to main content
Ctrl+K
NVIDIA NIM for Large Language Models 2.0 - Home NVIDIA NIM for Large Language Models 2.0 - Home

NVIDIA NIM for Large Language Models 2.0

  • Documentation Home
NVIDIA NIM for Large Language Models 2.0 - Home NVIDIA NIM for Large Language Models 2.0 - Home

NVIDIA NIM for Large Language Models 2.0

  • Documentation Home

Table of Contents

About NVIDIA NIM for LLMs

  • Overview
  • Enterprise-Grade Inference Software Stack
  • Release Notes

Get Started

  • About Get Started
  • Prerequisites
  • Installation
  • Configuration
  • Quickstart

Deployment

  • Model Profiles and Selection
  • Model Download
  • Model-Free NIM
  • Kubernetes Deployment
    • Helm and Kubernetes
    • KServe
    • OpenShift
    • Run:ai
    • NIM Operator Deployment
  • Cloud Service Provider (CSP) Deployment
    • Google Cloud
    • AWS
    • Azure
    • Oracle
  • Air-Gap Deployment
  • Multi-Node Deployment
  • vGPU Deployment

Advanced Use Cases

  • Fine-Tuning with LoRA
  • Custom Logits Processing
  • Prompt Embeddings

Reference

  • Architecture
  • Environment Variables
  • API Reference
  • CLI Reference
  • Advanced Configuration
  • Logging and Observability
  • 1.x Migration Guide
  • Support Matrix

Resources

  • Support and FAQ
  • Related Products
  • Legal
  • Related Software

Related Software#

NVIDIA NIM for LLMs fits into a broader inference and platform ecosystem. The following software products are highly relevant when you are deploying, operating, or extending LLM workloads.

Overview#

Product

Relation to NIM for LLMs

vLLM

NVIDIA NIM for LLMs packages vLLM as its inference backend, so many request semantics and tuning concepts come directly from vLLM.

NIM Operator

The operator manages NIM deployments by using Kubernetes custom resources and is especially useful for repeatable, production-scale rollouts.

NVIDIA NIM for Vision Language Models

It follows the same NIM operational model, but targets vision-language workloads rather than text-only LLM inference.

Usage Guidance#

  • Use NVIDIA NIM for LLMs when your workload is text-only and you want a curated, enterprise-ready container for production inference.

  • Use vLLM documentation alongside NIM documentation when you need deeper backend-specific context for passthrough arguments or upstream model-serving behavior.

  • Use the NIM Operator when your primary deployment target is Kubernetes and you want lifecycle automation around NIM services.

  • Use NVIDIA NIM for Vision Language Models when your application must process images and text together instead of text alone.

previous

Support and FAQ

next

Legal

On this page
  • Overview
  • Usage Guidance
NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2024-2026, NVIDIA Corporation.

Last updated on Mar 12, 2026.