Structured Generation
NIM LLM supports getting structured outputs by specifying a JSON schema, regular expression, context free grammar, or constraining the output to some particular choices. This can be useful where NIM is part of a larger pipeline and the LLM outputs are expected to be in a certain format. Below are some examples of how the outputs can be constrained in different ways.
You can constrain the output to follow a particular JSON schema by using the guided_json
parameter in the nvext
extension to the OpenAI schema. This approach is particularly useful in several scenarios:
Ensuring consistent output format for downstream processing
Validating complex data structures
Automating data extraction from unstructured text
Improving reliability in multi-step pipelines
NVIDIA recommends that you specify a JSON schema using the guided_json
parameter instead of setting response_format={"type": "json_object"}
. Using the response_format
parameter with type "json_object"
enabless the model to generate any valid JSON, including empty JSON.
Basic Example: Movie Review
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
json_schema = {
"type": "object",
"properties": {
"title": {
"type": "string"
},
"rating": {
"type": "number"
}
},
"required": [
"title",
"rating"
]
}
prompt = (f"Return the title and the rating based on the following movie review according to this JSON schema:{str(json_schema)}.\n"
f"Review: Inception is a really well made film. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama-3.1-70b-instruct",
messages=messages,
extra_body={"nvext": {"guided_json": json_schema}},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# {"title":"Inception", "rating":4.0}
Advanced Example: Product Information
This example demonstrates a more complex schema for extracting detailed product information:
json_schema = {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "number"},
"features": {
"type": "array",
"items": {"type": "string"}
},
"availability": {
"type": "object",
"properties": {
"in_stock": {"type": "boolean"},
"shipping_time": {"type": "string"}
},
"required": ["in_stock", "shipping_time"]
}
},
"required": ["product_name", "price", "features", "availability"]
}
prompt = (f"Extract product information from the following description according to this JSON schema:{str(json_schema)}.\n"
f"Description: The XYZ Smartwatch is our latest offering, priced at $299.99. It features a heart rate monitor, "
f"GPS tracking, and water resistance up to 50 meters. The product is currently in stock and ships within 2-3 business days.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama-3.1-70b-instruct",
messages=messages,
extra_body={"nvext": {"guided_json": json_schema}},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# {
# "product_name": "XYZ Smartwatch",
# "price": 299.99,
# "features": [
# "heart rate monitor",
# "GPS tracking",
# "water resistance up to 50 meters"
# ],
# "availability": {
# "in_stock": true,
# "shipping_time": "2-3 business days"
# }
# }
Example: Nested Structures for Event Planning
This example showcases how JSON schemas can handle nested structures, which is useful for complex data representations:
json_schema = {
"type": "object",
"properties": {
"event_name": {"type": "string"},
"date": {"type": "string", "format": "date"},
"attendees": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"role": {"type": "string"},
"confirmed": {"type": "boolean"}
},
"required": ["name", "role", "confirmed"]
}
},
"venue": {
"type": "object",
"properties": {
"name": {"type": "string"},
"address": {"type": "string"},
"capacity": {"type": "integer"}
},
"required": ["name", "address", "capacity"]
}
},
"required": ["event_name", "date", "attendees", "venue"]
}
prompt = (f"Create an event plan based on the following information using this JSON schema:{str(json_schema)}.\n"
f"Information: We're planning the Annual Tech Conference on 2024-09-15. John Doe (Speaker, confirmed) and Jane Smith (Organizer, confirmed) will attend. "
f"Alice Johnson (Volunteer, not confirmed yet) might join. The event will be held at Tech Center, 123 Innovation St., with a capacity of 500 people.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama-3.1-70b-instruct",
messages=messages,
extra_body={"nvext": {"guided_json": json_schema}},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# {
# "event_name": "Annual Tech Conference",
# "date": "2024-09-15",
# "attendees": [
# {"name": "John Doe", "role": "Speaker", "confirmed": true},
# {"name": "Jane Smith", "role": "Organizer", "confirmed": true},
# {"name": "Alice Johnson", "role": "Volunteer", "confirmed": false}
# ],
# "venue": {
# "name": "Tech Center",
# "address": "123 Innovation St.",
# "capacity": 500
# }
# }
By using JSON schemas, you can ensure that the LLM’s output adheres to a specific structure, making it easier to process and validate the generated data in your application’s workflow.
You can specify a regular expression for the output format using the guided_regex
parameter in the nvext
extension to the OpenAI schema.
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
regex = "[1-5]"
prompt = (f"Return just the rating based on the following movie review\n"
f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama3-8b-instruct",
messages=messages,
extra_body={"nvext": {"guided_regex": regex}},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# 4
You can specify a list of choices for the output using the guided_choice
parameter in the nvext
extension to the OpenAI schema.
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
choices = ["Good", "Bad", "Neutral"]
prompt = (f"Return the sentiment based on the following movie review. It should be one of{choices}\n"
f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama3-8b-instruct",
messages=messages,
extra_body={"nvext": {"guided_choice": choices}},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# Good
You can specify a context-free grammar in the EBNF format using the guided_grammar
parameter in the nvext
extension to the OpenAI schema.
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
grammar = """
?start: "The movie name is rated " rating " stars."
?rating: /[1-5]/
"""
prompt = (f"Summarize the following movie review:\n"
f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama3-8b-instruct",
messages=messages,
extra_body={"nvext": {"guided_grammar": grammar}},
stream=False
)
completion = response.choices[0].message.content
print(completion)
# Prints:
# The movie name is rated 4 stars.