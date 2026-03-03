Structured Generation with NVIDIA NIM for LLMs#

NIM for LLMs supports structured outputs by specifying a JSON schema, regular expression, context-free grammar, or by constraining the output to particular choices. This is useful when NIM is part of a larger pipeline and the LLM outputs are expected to be in a specific format. Below are some examples of how the outputs can be constrained in different ways.

We recommend that you use guided_json for optimal performance and reliability. guided_json uses xgrammar backend, which is the fastest option available. However, xgrammar does not support all configurations of guided decoding. If the system falls back to outlines , it can cause performance issues, particularly during the first inference.

Additionally, xgrammar supports a wider range of regular expressions compared to outlines . Using the outlines backend for regular expressions might cause the regular expressions to fail to compile.

Note When using SGLang profiles, guided decoding support is limited: Supported: xgrammar and outlines

Not supported: lm-format-enforcer

JSON Schema# You can constrain the output to follow a particular JSON schema by using the guided_json parameter. This approach is particularly useful in the following scenarios: Ensuring consistent output format for downstream processing

Validating complex data structures

Automating data extraction from unstructured text

Improving reliability in multi-step pipelines Important We recommend that you use the guided_json parameter to specify a JSON schema, instead of using response_format={"type": "json_object"} . The response_format option with type "json_object" permits the model to produce any valid JSON, including empty objects. Basic Example: Movie Review# client = OpenAI ( base_url = "http://0.0.0.0:8000/v1" , api_key = "not-used" ) json_schema = { "type" : "object" , "properties" : { "title" : { "type" : "string" }, "rating" : { "type" : "number" } }, "required" : [ "title" , "rating" ] } prompt = ( f "Return the title and the rating based on the following movie review according to this JSON schema: { str ( json_schema ) } .

" f "Review: Inception is a really well made film. I rate it four stars out of five." ) messages = [ { "role" : "user" , "content" : prompt }, ] response = client . chat . completions . create ( model = "meta/llama-3.1-70b-instruct" , messages = messages , extra_body = { "guided_json" : json_schema }, stream = False ) assistant_message = response . choices [ 0 ] . message . content print ( assistant_message ) # Prints: # {"title":"Inception", "rating":4.0} Advanced Example: Product Information# This example demonstrates a more complex schema for extracting detailed product information: json_schema = { "type" : "object" , "properties" : { "product_name" : { "type" : "string" }, "price" : { "type" : "number" }, "features" : { "type" : "array" , "items" : { "type" : "string" } }, "availability" : { "type" : "object" , "properties" : { "in_stock" : { "type" : "boolean" }, "shipping_time" : { "type" : "string" } }, "required" : [ "in_stock" , "shipping_time" ] } }, "required" : [ "product_name" , "price" , "features" , "availability" ] } prompt = ( f "Extract product information from the following description according to this JSON schema: { str ( json_schema ) } .

" f "Description: The XYZ Smartwatch is our latest offering, priced at $299.99. It features a heart rate monitor, " f "GPS tracking, and water resistance up to 50 meters. The product is currently in stock and ships within 2-3 business days." ) messages = [ { "role" : "user" , "content" : prompt }, ] response = client . chat . completions . create ( model = "meta/llama-3.1-70b-instruct" , messages = messages , extra_body = { "guided_json" : json_schema }, stream = False ) assistant_message = response . choices [ 0 ] . message . content print ( assistant_message ) # Prints: # { # "product_name": "XYZ Smartwatch", # "price": 299.99, # "features": [ # "heart rate monitor", # "GPS tracking", # "water resistance up to 50 meters" # ], # "availability": { # "in_stock": true, # "shipping_time": "2-3 business days" # } # } Example: Nested Structures for Event Planning# This example showcases how JSON schemas can handle nested structures, which is useful for complex data representations: json_schema = { "type" : "object" , "properties" : { "event_name" : { "type" : "string" }, "date" : { "type" : "string" , "format" : "date" }, "attendees" : { "type" : "array" , "items" : { "type" : "object" , "properties" : { "name" : { "type" : "string" }, "role" : { "type" : "string" }, "confirmed" : { "type" : "boolean" } }, "required" : [ "name" , "role" , "confirmed" ] } }, "venue" : { "type" : "object" , "properties" : { "name" : { "type" : "string" }, "address" : { "type" : "string" }, "capacity" : { "type" : "integer" } }, "required" : [ "name" , "address" , "capacity" ] } }, "required" : [ "event_name" , "date" , "attendees" , "venue" ] } prompt = ( f "Create an event plan based on the following information using this JSON schema: { str ( json_schema ) } .

" f "Information: We're planning the Annual Tech Conference on 2024-09-15. John Doe (Speaker, confirmed) and Jane Smith (Organizer, confirmed) will attend. " f "Alice Johnson (Volunteer, not confirmed yet) might join. The event will be held at Tech Center, 123 Innovation St., with a capacity of 500 people." ) messages = [ { "role" : "user" , "content" : prompt }, ] response = client . chat . completions . create ( model = "meta/llama-3.1-70b-instruct" , messages = messages , extra_body = { "guided_json" : json_schema }, stream = False ) assistant_message = response . choices [ 0 ] . message . content print ( assistant_message ) # Prints: # { # "event_name": "Annual Tech Conference", # "date": "2024-09-15", # "attendees": [ # {"name": "John Doe", "role": "Speaker", "confirmed": true}, # {"name": "Jane Smith", "role": "Organizer", "confirmed": true}, # {"name": "Alice Johnson", "role": "Volunteer", "confirmed": false} # ], # "venue": { # "name": "Tech Center", # "address": "123 Innovation St.", # "capacity": 500 # } # } By using JSON schemas, you can ensure that the LLM’s output adheres to a specific structure, making it easier to process and validate the generated data in your application’s workflow.

Regular Expressions# You can specify a regular expression for the output format by using the guided_regex parameter. client = OpenAI ( base_url = "http://0.0.0.0:8000/v1" , api_key = "not-used" ) regex = "[1-5]" prompt = ( f "Return just the rating based on the following movie review

" f "Review: This movie exceeds expectations. I rate it four stars out of five." ) messages = [ { "role" : "user" , "content" : prompt }, ] response = client . chat . completions . create ( model = "meta/llama3-8b-instruct" , messages = messages , extra_body = { "guided_regex" : regex }, stream = False ) assistant_message = response . choices [ 0 ] . message . content print ( assistant_message ) # Prints: # 4

Choices# You can specify a list of choices for the output using the guided_choice parameter. client = OpenAI ( base_url = "http://0.0.0.0:8000/v1" , api_key = "not-used" ) choices = [ "Good" , "Bad" , "Neutral" ] prompt = ( f "Return the sentiment based on the following movie review. It should be one of { choices }

" f "Review: This movie exceeds expectations. I rate it four stars out of five." ) messages = [ { "role" : "user" , "content" : prompt }, ] response = client . chat . completions . create ( model = "meta/llama3-8b-instruct" , messages = messages , extra_body = { "guided_choice" : choices }, stream = False ) assistant_message = response . choices [ 0 ] . message . content print ( assistant_message ) # Prints: # Good