Run Inference on NIM#

Use the NIM Proxy service to manage and route inference requests to your deployed NIM instances.

The NIM Proxy service provides APIs to configure routing rules, manage endpoints, and monitor the health of your NIM deployments. This allows you to efficiently distribute inference requests across multiple NIM instances.

Task Guides#

Perform common NIM Proxy tasks.

Tip

The tutorials reference an NIM_PROXY_BASE_URL whose value will depend on the ingress in your particular cluster. If you are using the minikube demo installation, it will be http://nim.test. Otherwise, you will need to consult with your own cluster administrator for the ingress values.

Health Check

Check the health status of the NIM Proxy service.

Health Check

List Models

View all available models that can be used for inference.

List Models

Chat Completions

Generate chat completions using the OpenAI-compatible API.

Chat Completions

Completions

Generate text completions using the OpenAI-compatible API.

Completions

Embeddings

Generate embeddings for text using the OpenAI-compatible API.

Embeddings

Run Inference on NIM#

Task Guides#

References#