NIM Operator Quick Start#

Follow the examples in the tiles to deploy NIM. NVIDIA highly recommends that you cache models for low inference latency and faster auto-scaling.

Non-LLM NIM

Domain-specific, RAG, BioNeMo, and Riva.

LLM-specific NIM

Optimized NVIDIA NIM.

Multi-LLM compatible NIM deployment

Broad range of LLMs, such as from Hugging Face.

Multi-node NIM deployment

Deploy large LLMs across several interconnected nodes.

NIM on KServe serverless deployment

Serverless deployment, scale-to-zero, and canary rollouts.

NIM with DRA

Advanced GPU allocation and scheduling.

NIM with LoRA

Enhance LLMs with domain-specific adapters.