Run Inference on NIM#
Use the NIM Proxy service to manage and route inference requests to your deployed NIM instances.
The NIM Proxy service provides APIs to configure routing rules, manage endpoints, and monitor the health of your NIM deployments. This allows you to efficiently distribute inference requests across multiple NIM instances.
Task Guides#
Perform common NIM Proxy tasks.
Tip
The tutorials reference an NIM_PROXY_BASE_URL whose value will depend on the ingress in your particular cluster. If you are using the minikube demo installation, it will be http://nim.test. Otherwise, you will need to consult with your own cluster administrator for the ingress values.
Check the health status of the NIM Proxy service.
View all available models that can be used for inference.
Generate chat completions using the OpenAI-compatible API.
Generate text completions using the OpenAI-compatible API.
Generate embeddings for text using the OpenAI-compatible API.