Chat Completions#
Generate chat completions using a model through the NIM Proxy microservice through a POST API call.
Prerequisites#
Before you can generate chat completions, make sure that you have:
Access to the NIM Proxy microservice through the base URL where the service is deployed. Store the base URL in an environment variable
NIM_PROXY_BASE_URL
.A valid model name. To retrieve the list of models deployed as NIM microservices in your environment, use the
${NIM_PROXY_BASE_URL}/v1/models
API. For more information, see List Models.
To Generate Chat Completions#
Choose one of the following options of generating chat completions.
Create a NeMoMicroservices
client instance using the base URL of the NIM Proxy microservice and perform the task as follows.
from nemo_microservices import NeMoMicroservices
client = NeMoMicroservices(
base_url=os.environ["NEMO_BASE_URL"],
inference_base_url=os.environ["NIM_PROXY_BASE_URL"]
)
response = client.chat.completions.create(
model="meta/llama-3.1-8b-instruct",
messages=[
{"role": "user", "content": "what can you do?"}
],
temperature=0.7,
max_tokens=100,
stream=True
)
for chunk in response:
print(chunk)
Make a POST request to the /v1/chat/completions
endpoint.
For more details on the request body, refer to the NIM for LLMs API reference and find the API named the same as v1/chat/completions
. The NIM Proxy API endpoint routes your requests to the NIM for LLMs microservice.
curl -X POST \
"${NIM_PROXY_BASE_URL}/v1/chat/completions" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{
"role":"user",
"content":"Hello! How are you?"
}
],
"max_tokens": 32
}' | jq
Example Response
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "meta/llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'm doing well, thank you for asking! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 12,
"total_tokens": 27
}
}