About Deploying and Proxying NIM for LLMs#

The NeMo microservices platform provides capabilities of simplifying the deployment and management of NIM for LLMs and proxying them through a single NeMo platform host endpoint. In this section, you learn how to use the capabilities for deploying NIM for LLMs to your Kubernetes cluster and proxying them.

Task Guides#

The following guides provide detailed information on how to deploy NIM for LLMs, proxy them through a single API, and run inference.

Deploy NIM for LLMs

Deploy NIM for LLMs to your Kubernetes cluster.

Deploy NVIDIA NIM for large language models (LLMs)

Proxy Deployed NIM for LLMs and Run Inference

Proxy deployed NIM for LLMs and run inference on them.

Proxy Deployed NIM for LLMs