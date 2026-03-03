Structured Generation with NVIDIA NIM for LLMs#
NIM for LLMs supports structured outputs by specifying a JSON schema, regular expression, context-free grammar, or by constraining the output to particular choices. This is useful when NIM is part of a larger pipeline and the LLM outputs are expected to be in a specific format. Below are some examples of how the outputs can be constrained in different ways.
We recommend that you use
guided_json for optimal performance and reliability.
guided_json uses
xgrammar backend, which is the fastest option available.
However,
xgrammar does not support all configurations of guided decoding.
If the system falls back to
outlines, it can cause performance issues, particularly during the first inference.
Additionally,
xgrammar supports a wider range of regular expressions compared to
outlines.
Using the
outlines backend for regular expressions might cause the regular expressions to fail to compile.
Note
When using SGLang profiles, guided decoding support is limited:
Supported:
xgrammarand
outlines
Not supported:
lm-format-enforcer
JSON Schema#
You can constrain the output to follow a particular JSON schema by using the
guided_json parameter.
This approach is particularly useful in the following scenarios:
Ensuring consistent output format for downstream processing
Validating complex data structures
Automating data extraction from unstructured text
Improving reliability in multi-step pipelines
Important
We recommend that you use the
guided_json parameter to specify a JSON schema, instead of using
response_format={"type": "json_object"}. The
response_format option with type
"json_object" permits the model to produce any valid JSON, including empty objects.
Basic Example: Movie Review#
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
json_schema = {
"type": "object",
"properties": {
"title": {
"type": "string"
},
"rating": {
"type": "number"
}
},
"required": [
"title",
"rating"
]
}
prompt = (f"Return the title and the rating based on the following movie review according to this JSON schema: {str(json_schema)}.\n"
f"Review: Inception is a really well made film. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama-3.1-70b-instruct",
messages=messages,
extra_body={"guided_json": json_schema},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# {"title":"Inception", "rating":4.0}
Advanced Example: Product Information#
This example demonstrates a more complex schema for extracting detailed product information:
json_schema = {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "number"},
"features": {
"type": "array",
"items": {"type": "string"}
},
"availability": {
"type": "object",
"properties": {
"in_stock": {"type": "boolean"},
"shipping_time": {"type": "string"}
},
"required": ["in_stock", "shipping_time"]
}
},
"required": ["product_name", "price", "features", "availability"]
}
prompt = (f"Extract product information from the following description according to this JSON schema: {str(json_schema)}.\n"
f"Description: The XYZ Smartwatch is our latest offering, priced at $299.99. It features a heart rate monitor, "
f"GPS tracking, and water resistance up to 50 meters. The product is currently in stock and ships within 2-3 business days.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama-3.1-70b-instruct",
messages=messages,
extra_body={"guided_json": json_schema},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# {
# "product_name": "XYZ Smartwatch",
# "price": 299.99,
# "features": [
# "heart rate monitor",
# "GPS tracking",
# "water resistance up to 50 meters"
# ],
# "availability": {
# "in_stock": true,
# "shipping_time": "2-3 business days"
# }
# }
Example: Nested Structures for Event Planning#
This example showcases how JSON schemas can handle nested structures, which is useful for complex data representations:
json_schema = {
"type": "object",
"properties": {
"event_name": {"type": "string"},
"date": {"type": "string", "format": "date"},
"attendees": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"role": {"type": "string"},
"confirmed": {"type": "boolean"}
},
"required": ["name", "role", "confirmed"]
}
},
"venue": {
"type": "object",
"properties": {
"name": {"type": "string"},
"address": {"type": "string"},
"capacity": {"type": "integer"}
},
"required": ["name", "address", "capacity"]
}
},
"required": ["event_name", "date", "attendees", "venue"]
}
prompt = (f"Create an event plan based on the following information using this JSON schema: {str(json_schema)}.\n"
f"Information: We're planning the Annual Tech Conference on 2024-09-15. John Doe (Speaker, confirmed) and Jane Smith (Organizer, confirmed) will attend. "
f"Alice Johnson (Volunteer, not confirmed yet) might join. The event will be held at Tech Center, 123 Innovation St., with a capacity of 500 people.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama-3.1-70b-instruct",
messages=messages,
extra_body={"guided_json": json_schema},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# {
# "event_name": "Annual Tech Conference",
# "date": "2024-09-15",
# "attendees": [
# {"name": "John Doe", "role": "Speaker", "confirmed": true},
# {"name": "Jane Smith", "role": "Organizer", "confirmed": true},
# {"name": "Alice Johnson", "role": "Volunteer", "confirmed": false}
# ],
# "venue": {
# "name": "Tech Center",
# "address": "123 Innovation St.",
# "capacity": 500
# }
# }
By using JSON schemas, you can ensure that the LLM’s output adheres to a specific structure, making it easier to process and validate the generated data in your application’s workflow.
Regular Expressions#
You can specify a regular expression for the output format by using the
guided_regex parameter.
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
regex = "[1-5]"
prompt = (f"Return just the rating based on the following movie review\n"
f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama3-8b-instruct",
messages=messages,
extra_body={"guided_regex": regex},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# 4
Choices#
You can specify a list of choices for the output using the
guided_choice parameter.
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
choices = ["Good", "Bad", "Neutral"]
prompt = (f"Return the sentiment based on the following movie review. It should be one of {choices}\n"
f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama3-8b-instruct",
messages=messages,
extra_body={"guided_choice": choices},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# Good
Context-free Grammar#
You can specify a context-free grammar in the EBNF format using the
guided_grammar parameter.
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
grammar = """
root ::= "The movie name is rated " rating " stars."
rating ::= [1-5]
"""
prompt = (f"Summarize the following movie review:\n"
f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
{"role": "user", "content": prompt},
]
response = client.chat.completions.create(
model="meta/llama3-8b-instruct",
messages=messages,
extra_body={"guided_grammar": grammar},
stream=False
)
completion = response.choices[0].message.content
print(completion)
# Prints:
# The movie name is rated 4 stars.
The default guided decoding backend is XGrammar.
If you use
outlines instead, you must use a different syntax for the grammar definition.
The following is an example grammar for the
outlines backend:
grammar = """
?start: "The movie name is rated " rating " stars."
?rating: /[1-5]/
"""
We recommend the
xgrammar backend for reliability and performance.
For large grammars, the
outlines backend can take a long time to compile on the first inference request.
If
xgrammar does not recognize the syntax, guided decoding falls back to
outlines.