Run Inference on NIM#
Use the NIM Proxy service to manage and route inference requests to your deployed NIM instances.
The NIM Proxy service provides APIs to configure routing rules, manage endpoints, and monitor the health of your NIM deployments. This allows you to efficiently distribute inference requests across multiple NIM instances.
Task Guides#
Perform common NIM Proxy tasks.
Health Check
Check the health status of the NIM Proxy service.
List Models
View all available models that can be used for inference.
Chat Completions
Generate chat completions using the OpenAI-compatible API.
Completions
Generate text completions using the OpenAI-compatible API.