NIM Operator Quick Start#
Follow the examples in the tiles to deploy NIM. NVIDIA highly recommends that you cache models for low inference latency and faster auto-scaling.
Non-LLM NIM
Domain-specific, RAG, BioNeMo, and Riva.
LLM-specific NIM
Optimized NVIDIA NIM.
Multi-LLM compatible NIM deployment
Broad range of LLMs, such as from Hugging Face.
Multi-node NIM deployment
Deploy large LLMs across several interconnected nodes.
NIM on KServe serverless deployment
Serverless deployment, scale-to-zero, and canary rollouts.
NIM with DRA
Advanced GPU allocation and scheduling.
NIM with LoRA
Enhance LLMs with domain-specific adapters.