About Deploying and Running Inference on NIM#

The NeMo microservices platform provides capabilities of simplifying the deployment and management of NIM and proxying them through a single NeMo platform host endpoint. In this section, you learn how to use the capabilities for deploying NIM to your Kubernetes cluster and proxying them.

Tutorials#

The following guides provide detailed information on how to deploy NIM, proxy them through a single API, and run inference.

Deploy NIM

Deploy NIM to your Kubernetes cluster.

Deploy NVIDIA NIM

Proxy Deployed NIM and Run Inference

Proxy deployed NIM and run inference on them.

Run Inference on Deployed NIM

Task Guides#

Perform common tasks for deploying NIM and running inference on them.

Manage NIM Deployments

Manage NIM deployments and their configurations.

Manage NIM Deployments

Run Inference on NIM

Discover models deployed as NIM microservices and run inference on them through the single API endpoint of the NIM Proxy microservice.

Run Inference on NIM