Examples with system role#

Note

Requires NVIDIA NIM for LLMs version 1.0.2 or later.

Message roles#

The message object includes a role (system, user or assistant) and the content.

System role: This is optional and helps define the assistant’s behavior. It can be used to provide instructions or set the context for the assistant. You can include multiple system messages in a conversation, and the model will process them in the order they are received.
User role: These messages contain requests or comments from the user that the assistant should respond to.
Assistant role: These hold previous responses from the assistant.

By default, there are no system messages. Use system messages to provide context or instructions to the model beyond the user’s input.

OpenAI Chat Completion Request with Single User Question#

The Chat Completions endpoint is typically used with chat or instruct tuned models that are designed to be used through a conversational approach. With the Chat Completions endpoint, prompts are sent in the form of messages with roles and contents, giving a natural way to keep track of a multi-turn conversation. To stream the result, set "stream": true.

Here is an example of a Chat Completions endpoint with a single user question. This is ideal for isolated queries where additional context is not needed.

Important

Update model name according to your requirements. For example, for a llama-3.1-8b-instruct model, you might use the following command:

curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{
      "model": "meta/llama-3.1-8b-instruct",
      "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "Who won the world series in 2020?"
          }
        ],
        "top_p": 1,
        "n": 1,
        "max_tokens": 50,
        "stream": false,
        "frequency_penalty": 1.0,
        "stop": ["hello"]
    }'

Which prints:

{"id":"chat-a140c650f12348ad910cd3d1a4b2f551","object":"chat.completion","created":1726092664,"model":"meta/llama-3.1-8b-instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The Los Angeles Dodgers won the 2020 World Series, defeating the Tampa Bay Rays in six games (4-2). It was the Dodgers' first World Series title since 1988."},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":33,"total_tokens":72,"completion_tokens":39}}

You can also use the OpenAI Python API library.

from openai import OpenAI
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")

messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Who won the world series in 2020?"}
]

chat_response = client.chat.completions.create(
    model="meta/llama-3.1-8b-instruct",
    messages=messages,
    max_tokens=50,
    stream=False
)
assistant_message = chat_response.choices[0].message
print(assistant_message)

Which prints:

ChatCompletionMessage(content='The Los Angeles Dodgers won the 2020 World Series, defeating the Tampa Bay Rays in six games (4-2). It was their first World Series title since 1988.', refusal=None, role='assistant', function_call=None, tool_calls=None)

OpenAI Chat Completion Request with Additional Context and Response#

Here is an example of a Chat Completions endpoint with series of messages with different roles for continued interaction. This offers more detailed interaction with context and previous messages enhancing the relevance and coherence of the assistant’s responses.

To stream the result, set "stream": true.

Important

Update model name according to your requirements.

curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{
      "model": "meta/llama-3.1-8b-instruct",
      "messages": [
          {
              "role": "system",
              "content": "You are a helpful assistant."
          },
          {
              "role": "user",
              "content": "Who won the world series in 2020?"
          },
          {
              "role": "assistant",
              "content": "The Los Angeles Dodgers won the World Series in 2020."
          },
          {
              "role": "user",
              "content": "Where was it played?"
          }
        ],
      "top_p": 1,
      "n": 1,
      "max_tokens": 32,
      "stream": false,
      "frequency_penalty": 1.0,
      "stop": ["hello"]
    }'

Which prints:

{"id":"chat-50e2cd2741134b4a95c07a58af321793","object":"chat.completion","created":1726093184,"model":"meta/llama-3.1-8b-instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The 2020 World Series was played at Globe Life Field, which is the home stadium of the Texas Rangers, however, the Los Angeles Dodgers won the series 4-2 against the Tampa Bay Rays."},"logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":61,"total_tokens":93,"completion_tokens":32}}

You can also use the OpenAI Python API library.

from openai import OpenAI
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"}
]
chat_response = client.chat.completions.create(
    model="meta/llama-3.1-8b-instruct",
    messages=messages,
    max_tokens=32,
    stream=False
)
assistant_message = chat_response.choices[0].message
print(assistant_message)

Which prints:

ChatCompletionMessage(content='The 2020 World Series was played at Globe Life Field, which is the home stadium of the Texas Rangers, however, the Los Angeles Dodgers won the series 4-2 against the Tampa Bay Rays.', refusal=None, role='assistant', function_call=None, tool_calls=None)