Chat Completions#

Generate chat completions using a model through the NIM Proxy microservice through a POST API call.

Prerequisites#

Before you can generate chat completions, make sure that you have:

Access to the NIM Proxy microservice through the base URL where the service is deployed. Store the base URL in an environment variable NIM_PROXY_BASE_URL.
A valid model name. To retrieve the list of models deployed as NIM microservices in your environment, use the ${NIM_PROXY_BASE_URL}/v1/models API. For more information, see List Models.

To Generate Chat Completions#

Choose one of the following options of generating chat completions.

Python SDK

Create a NeMoMicroservices client instance using the base URL of the NIM Proxy microservice and perform the task as follows.

from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(
    base_url=os.environ["NEMO_BASE_URL"],
    inference_base_url=os.environ["NIM_PROXY_BASE_URL"]
)

response = client.chat.completions.create(
    model="meta/llama-3.1-8b-instruct",
    messages=[
        {"role": "user", "content": "what can you do?"}
    ],
    temperature=0.7,
    max_tokens=100,
    stream=True
)
for chunk in response:
    print(chunk)

cURL

Make a POST request to the /v1/chat/completions endpoint.

For more details on the request body, refer to the NIM for LLMs API reference and find the API named the same as v1/chat/completions. The NIM Proxy API endpoint routes your requests to the NIM for LLMs microservice.

curl -X POST \
  "${NIM_PROXY_BASE_URL}/v1/chat/completions" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "meta/llama-3.1-8b-instruct",
    "messages": [
        {
            "role":"user",
            "content":"Hello! How are you?"
        }
    ],
    "max_tokens": 32
  }' | jq