Structured Generation
NIM LLM supports getting structured outputs by specifying a JSON schema, regular expression, context free grammar, or constraining the output to some particular choices. This can be useful where NIM is part of a larger pipeline and the LLM outputs are expected to be in a certain format. Below are some examples of how the outputs can be constrained in different ways.
You can constrain the output to follow a particular JSON schema by using the guided_json
parameter in the nvext
extension to the OpenAI schema.
We recommend you specify a JSON schema like above using the guided_json
parameter instead of setting response_format={"type": "json_object"}
as setting the latter can produce empty outputs.
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
json_schema = {
"type": "object",
"properties": {
"title": {
"type": "string"
},
"rating": {
"type": "number"
}
},
"required": [
"title",
"rating"
]
}
prompt = (f"Return the title and the rating based on the following movie review according to this JSON schema:{str(json_schema)}.\n"
f"Review: Inception is a really well made film. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama3-8b-instruct",
messages=messages,
extra_body={"nvext": {"guided_json": json_schema}},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# {"title":"Inception", "rating":4.0}
You can specify a regular expression for the output format using the guided_regex
parameter in the nvext
extension to the OpenAI schema.
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
regex = "[1-5]"
prompt = (f"Return just the rating based on the following movie review\n"
f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama3-8b-instruct",
messages=messages,
extra_body={"nvext": {"guided_regex": regex}},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# 4
You can specify a list of choices for the output using the guided_choice
parameter in the nvext
extension to the OpenAI schema.
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
choices = ["Good", "Bad", "Neutral"]
prompt = (f"Return the sentiment based on the following movie review. It should be one of{choices}\n"
f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama3-8b-instruct",
messages=messages,
extra_body={"nvext": {"guided_choice": choices}},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# Good
You can specify a context-free grammar in the EBNF format using the guided_grammar
parameter in the nvext
extension to the OpenAI schema.
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
grammar = """
?start: "The movie name is rated " rating " stars."
?rating: /[1-5]/
"""
prompt = (f"Summarize the following movie review:\n"
f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama3-8b-instruct",
messages=messages,
extra_body={"nvext": {"guided_grammar": grammar}},
stream=False
)
completion = response.choices[0].text
print(completion)
# Prints:
# The movie name is rated 4 stars.