Large Language Models (1.1.0)
Large Language Models (1.1.0)

Structured Generation

NIM LLM supports getting structured outputs by specifying a JSON schema, regular expression, context free grammar, or constraining the output to some particular choices. This can be useful where NIM is part of a larger pipeline and the LLM outputs are expected to be in a certain format. Below are some examples of how the outputs can be constrained in different ways.

You can constrain the output to follow a particular JSON schema by using the guided_json parameter in the nvext extension to the OpenAI schema. This approach is particularly useful in several scenarios:

  • Ensuring consistent output format for downstream processing

  • Validating complex data structures

  • Automating data extraction from unstructured text

  • Improving reliability in multi-step pipelines

Important

NVIDIA recommends that you specify a JSON schema using the guided_json parameter instead of setting response_format={"type": "json_object"}. Using the response_format parameter with type "json_object" enabless the model to generate any valid JSON, including empty JSON.

Basic Example: Movie Review

Copy
Copied!
            

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used") json_schema = { "type": "object", "properties": { "title": { "type": "string" }, "rating": { "type": "number" } }, "required": [ "title", "rating" ] } prompt = (f"Return the title and the rating based on the following movie review according to this JSON schema:{str(json_schema)}.\n" f"Review: Inception is a really well made film. I rate it four stars out of five.") messages = [ {"role": "user", "content": prompt}, ] response = client.chat.completions.create( model="meta/llama-3.1-70b-instruct", messages=messages, extra_body={"nvext": {"guided_json": json_schema}}, stream=False ) assistant_message = response.choices[0].message.content print(assistant_message) # Prints: # {"title":"Inception", "rating":4.0}

Advanced Example: Product Information

This example demonstrates a more complex schema for extracting detailed product information:

Copy
Copied!
            

json_schema = { "type": "object", "properties": { "product_name": {"type": "string"}, "price": {"type": "number"}, "features": { "type": "array", "items": {"type": "string"} }, "availability": { "type": "object", "properties": { "in_stock": {"type": "boolean"}, "shipping_time": {"type": "string"} }, "required": ["in_stock", "shipping_time"] } }, "required": ["product_name", "price", "features", "availability"] } prompt = (f"Extract product information from the following description according to this JSON schema:{str(json_schema)}.\n" f"Description: The XYZ Smartwatch is our latest offering, priced at $299.99. It features a heart rate monitor, " f"GPS tracking, and water resistance up to 50 meters. The product is currently in stock and ships within 2-3 business days.") messages = [ {"role": "user", "content": prompt}, ] response = client.chat.completions.create( model="meta/llama-3.1-70b-instruct", messages=messages, extra_body={"nvext": {"guided_json": json_schema}}, stream=False ) assistant_message = response.choices[0].message.content print(assistant_message) # Prints: # { # "product_name": "XYZ Smartwatch", # "price": 299.99, # "features": [ # "heart rate monitor", # "GPS tracking", # "water resistance up to 50 meters" # ], # "availability": { # "in_stock": true, # "shipping_time": "2-3 business days" # } # }

Example: Nested Structures for Event Planning

This example showcases how JSON schemas can handle nested structures, which is useful for complex data representations:

Copy
Copied!
            

json_schema = { "type": "object", "properties": { "event_name": {"type": "string"}, "date": {"type": "string", "format": "date"}, "attendees": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "role": {"type": "string"}, "confirmed": {"type": "boolean"} }, "required": ["name", "role", "confirmed"] } }, "venue": { "type": "object", "properties": { "name": {"type": "string"}, "address": {"type": "string"}, "capacity": {"type": "integer"} }, "required": ["name", "address", "capacity"] } }, "required": ["event_name", "date", "attendees", "venue"] } prompt = (f"Create an event plan based on the following information using this JSON schema:{str(json_schema)}.\n" f"Information: We're planning the Annual Tech Conference on 2024-09-15. John Doe (Speaker, confirmed) and Jane Smith (Organizer, confirmed) will attend. " f"Alice Johnson (Volunteer, not confirmed yet) might join. The event will be held at Tech Center, 123 Innovation St., with a capacity of 500 people.") messages = [ {"role": "user", "content": prompt}, ] response = client.chat.completions.create( model="meta/llama-3.1-70b-instruct", messages=messages, extra_body={"nvext": {"guided_json": json_schema}}, stream=False ) assistant_message = response.choices[0].message.content print(assistant_message) # Prints: # { # "event_name": "Annual Tech Conference", # "date": "2024-09-15", # "attendees": [ # {"name": "John Doe", "role": "Speaker", "confirmed": true}, # {"name": "Jane Smith", "role": "Organizer", "confirmed": true}, # {"name": "Alice Johnson", "role": "Volunteer", "confirmed": false} # ], # "venue": { # "name": "Tech Center", # "address": "123 Innovation St.", # "capacity": 500 # } # }

By using JSON schemas, you can ensure that the LLM’s output adheres to a specific structure, making it easier to process and validate the generated data in your application’s workflow.

You can specify a regular expression for the output format using the guided_regex parameter in the nvext extension to the OpenAI schema.

Copy
Copied!
            

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used") regex = "[1-5]" prompt = (f"Return just the rating based on the following movie review\n" f"Review: This movie exceeds expectations. I rate it four stars out of five.") messages = [ {"role": "user", "content": prompt}, ] response = client.chat.completions.create( model="meta/llama3-8b-instruct", messages=messages, extra_body={"nvext": {"guided_regex": regex}}, stream=False ) assistant_message = response.choices[0].message.content print(assistant_message) # Prints: # 4

You can specify a list of choices for the output using the guided_choice parameter in the nvext extension to the OpenAI schema.

Copy
Copied!
            

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used") choices = ["Good", "Bad", "Neutral"] prompt = (f"Return the sentiment based on the following movie review. It should be one of{choices}\n" f"Review: This movie exceeds expectations. I rate it four stars out of five.") messages = [ {"role": "user", "content": prompt}, ] response = client.chat.completions.create( model="meta/llama3-8b-instruct", messages=messages, extra_body={"nvext": {"guided_choice": choices}}, stream=False ) assistant_message = response.choices[0].message.content print(assistant_message) # Prints: # Good

You can specify a context-free grammar in the EBNF format using the guided_grammar parameter in the nvext extension to the OpenAI schema.

Copy
Copied!
            

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used") grammar = """ ?start: "The movie name is rated " rating " stars." ?rating: /[1-5]/ """ prompt = (f"Summarize the following movie review:\n" f"Review: This movie exceeds expectations. I rate it four stars out of five.") messages = [ {"role": "user", "content": prompt}, ] response = client.chat.completions.create( model="meta/llama3-8b-instruct", messages=messages, extra_body={"nvext": {"guided_grammar": grammar}}, stream=False ) completion = response.choices[0].message.content print(completion) # Prints: # The movie name is rated 4 stars.

Previous Observability
Next Parameter-Efficient Fine-Tuning
© Copyright © 2024, NVIDIA Corporation. Last updated on Sep 9, 2024.