Run Inference on NIM#

Use the NIM Proxy service to manage and route inference requests to your deployed NIM instances.

The NIM Proxy service provides APIs to configure routing rules, manage endpoints, and monitor the health of your NIM deployments. This allows you to efficiently distribute inference requests across multiple NIM instances.


Task Guides#

Perform common NIM Proxy tasks.

Health Check

Check the health status of the NIM Proxy service.

Health Check
List Models

View all available models that can be used for inference.

List Models
Chat Completions

Generate chat completions using the OpenAI-compatible API.

Chat Completions
Completions

Generate text completions using the OpenAI-compatible API.

Completions

References#