Chat Completions#

Generate chat completions using a model through the NIM Proxy microservice through a POST API call.

Prerequisites#

Before you can generate chat completions, make sure that you have:

  • Access to the NIM Proxy microservice through the base URL where the service is deployed. Store the base URL in an environment variable NIM_PROXY_BASE_URL.

  • A valid model name. To retrieve the list of models deployed as NIM microservices in your environment, use the ${NIM_PROXY_BASE_URL}/v1/models API. For more information, see List Models.


Options#

You can generate chat completions in the following ways.

API#

  1. Perform a POST request to the /v1/chat/completions endpoint.

    Use the following cURL command. For more details on the request body, refer to the NIM for LLMs API reference and find the API named the same as v1/chat/completions. The NIM Proxy API endpoint routes your requests to the NIM for LLMs microservice.

    curl -X POST \
      "${NIM_PROXY_BASE_URL}/v1/chat/completions" \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
        "model": "meta/llama-3.1-8b-instruct",
        "messages": [
            {
                "role":"user",
                "content":"Hello! How are you?"
            }
        ],
        "max_tokens": 32
      }' | jq
    
  2. Review the response.

    Example Response
    {
      "id": "chatcmpl-123",
      "object": "chat.completion",
      "created": 1677652288,
      "model": "meta/llama-3.1-8b-instruct",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "I'm doing well, thank you for asking! How can I help you today?"
          },
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 15,
        "completion_tokens": 12,
        "total_tokens": 27
      }
    }