Skip to main content
country_code
Ctrl+K
NVIDIA NIM for Large Language Models - Home NVIDIA NIM for Large Language Models - Home

NVIDIA NIM for Large Language Models

  • Documentation Home
NVIDIA NIM for Large Language Models - Home NVIDIA NIM for Large Language Models - Home

NVIDIA NIM for Large Language Models

  • Documentation Home

Table of Contents

About NVIDIA NIM for LLMs

  • Overview
  • Enterprise-Grade Inference Software Stack
  • Release Notes

Get Started

  • About Get Started
  • Prerequisites
  • Installation
  • Configuration
  • Quickstart

Deployment

  • Model Profiles and Selection
  • Model Download
  • Model-Free NIM
  • Kubernetes Deployment
    • Helm and Kubernetes
    • KServe
    • OpenShift
    • Run:ai
    • NIM Operator Deployment
  • Cloud Service Provider (CSP) Deployment
    • Google Cloud
    • AWS
    • Azure
    • Oracle
  • Air-Gap Deployment
  • Multi-Node Deployment
  • vGPU Deployment

Advanced Use Cases

  • Fine-Tuning with LoRA
  • Custom Logits Processing
  • Prompt Embeddings

Reference

  • Architecture
  • Environment Variables
  • API Reference
  • CLI Reference
  • Advanced Configuration
  • Logging and Observability
  • 1.x Migration Guide
  • Support Matrix
  • Archived Versions

Troubleshooting

  • GPU Memory (OOM) Errors

Resources

  • Support and FAQ
  • Related Products
  • Legal
  • Related Software
Is this page helpful?

Related Software#

NVIDIA NIM for Large Language Models fits into a broader inference and platform ecosystem. The following software products are highly relevant when you are deploying, operating, or extending LLM workloads.

Product

Relation to NIM LLM

vLLM

NVIDIA NIM for Large Language Models packages vLLM as its inference backend, so many request semantics and tuning concepts come directly from vLLM.

NIM Operator

The operator manages NIM deployments by using Kubernetes custom resources and is especially useful for repeatable, production-scale rollouts.

NVIDIA NIM for Vision Language Models

NIM for VLMs follows the same NIM operational model, but targets vision-language workloads rather than text-only LLM inference.

Usage Guidance#

To help you choose the right tool for your specific use case, consider the following recommendations:

  • Use NIM LLM when your workload is text-only and you want a curated, enterprise-ready container for production inference.

  • Use vLLM documentation alongside NIM documentation when you need deeper, backend-specific context for passthrough arguments or upstream model-serving behavior.

  • Use the NIM Operator when your primary deployment target is Kubernetes and you want lifecycle automation around NIM services.

  • Use NVIDIA NIM for Vision Language Models when your application must process images and text together instead of text alone.

previous

Support and FAQ

next

Legal

On this page
  • Usage Guidance
NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2024-2026, NVIDIA Corporation.

Last updated on Mar 25, 2026.