Structured Generation#

NIM LLM supports getting structured outputs by specifying a JSON schema, regular expression, context free grammar, or constraining the output to some particular choices. This can be useful where NIM is part of a larger pipeline and the LLM outputs are expected to be in a certain format. Below are some examples of how the outputs can be constrained in different ways.

JSON Schema#

You can constrain the output to follow a particular JSON schema by using the guided_json parameter in the nvext extension to the OpenAI schema. This approach is particularly useful in several scenarios:

Ensuring consistent output format for downstream processing
Validating complex data structures
Automating data extraction from unstructured text
Improving reliability in multi-step pipelines

Important

NVIDIA recommends that you specify a JSON schema using the guided_json parameter instead of setting response_format={"type": "json_object"}. Using the response_format parameter with type "json_object" enabless the model to generate any valid JSON, including empty JSON.

Basic Example: Movie Review#

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
json_schema = {
    "type": "object",
    "properties": {
        "title": {
            "type": "string"
        },
        "rating": {
            "type": "number"
        }
    },
    "required": [
        "title",
        "rating"
    ]
}
prompt = (f"Return the title and the rating based on the following movie review according to this JSON schema: {str(json_schema)}.\n"
          f"Review: Inception is a really well made film. I rate it four stars out of five.")
messages = [
    {"role": "user", "content": prompt},
]
response = client.chat.completions.create(
    model="meta/llama-3.1-70b-instruct",
    messages=messages,
    extra_body={"nvext": {"guided_json": json_schema}},
    stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# {"title":"Inception", "rating":4.0}

Advanced Example: Product Information#

This example demonstrates a more complex schema for extracting detailed product information:

json_schema = {
    "type": "object",
    "properties": {
        "product_name": {"type": "string"},
        "price": {"type": "number"},
        "features": {
            "type": "array",
            "items": {"type": "string"}
        },
        "availability": {
            "type": "object",
            "properties": {
                "in_stock": {"type": "boolean"},
                "shipping_time": {"type": "string"}
            },
            "required": ["in_stock", "shipping_time"]
        }
    },
    "required": ["product_name", "price", "features", "availability"]
}

prompt = (f"Extract product information from the following description according to this JSON schema: {str(json_schema)}.\n"
          f"Description: The XYZ Smartwatch is our latest offering, priced at $299.99. It features a heart rate monitor, "
          f"GPS tracking, and water resistance up to 50 meters. The product is currently in stock and ships within 2-3 business days.")

messages = [
    {"role": "user", "content": prompt},
]

response = client.chat.completions.create(
    model="meta/llama-3.1-70b-instruct",
    messages=messages,
    extra_body={"nvext": {"guided_json": json_schema}},
    stream=False
)

assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# {
#     "product_name": "XYZ Smartwatch",
#     "price": 299.99,
#     "features": [
#         "heart rate monitor",
#         "GPS tracking",
#         "water resistance up to 50 meters"
#     ],
#     "availability": {
#         "in_stock": true,
#         "shipping_time": "2-3 business days"
#     }
# }

Example: Nested Structures for Event Planning#

This example showcases how JSON schemas can handle nested structures, which is useful for complex data representations:

json_schema = {
    "type": "object",
    "properties": {
        "event_name": {"type": "string"},
        "date": {"type": "string", "format": "date"},
        "attendees": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "role": {"type": "string"},
                    "confirmed": {"type": "boolean"}
                },
                "required": ["name", "role", "confirmed"]
            }
        },
        "venue": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "address": {"type": "string"},
                "capacity": {"type": "integer"}
            },
            "required": ["name", "address", "capacity"]
        }
    },
    "required": ["event_name", "date", "attendees", "venue"]
}

prompt = (f"Create an event plan based on the following information using this JSON schema: {str(json_schema)}.\n"
          f"Information: We're planning the Annual Tech Conference on 2024-09-15. John Doe (Speaker, confirmed) and Jane Smith (Organizer, confirmed) will attend. "
          f"Alice Johnson (Volunteer, not confirmed yet) might join. The event will be held at Tech Center, 123 Innovation St., with a capacity of 500 people.")

messages = [
    {"role": "user", "content": prompt},
]

response = client.chat.completions.create(
    model="meta/llama-3.1-70b-instruct",
    messages=messages,
    extra_body={"nvext": {"guided_json": json_schema}},
    stream=False
)

assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# {
#     "event_name": "Annual Tech Conference",
#     "date": "2024-09-15",
#     "attendees": [
#         {"name": "John Doe", "role": "Speaker", "confirmed": true},
#         {"name": "Jane Smith", "role": "Organizer", "confirmed": true},
#         {"name": "Alice Johnson", "role": "Volunteer", "confirmed": false}
#     ],
#     "venue": {
#         "name": "Tech Center",
#         "address": "123 Innovation St.",
#         "capacity": 500
#     }
# }

By using JSON schemas, you can ensure that the LLM’s output adheres to a specific structure, making it easier to process and validate the generated data in your application’s workflow.

Regular Expressions#

You can specify a regular expression for the output format using the guided_regex parameter in the nvext extension to the OpenAI schema.

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
regex = "[1-5]"
prompt = (f"Return just the rating based on the following movie review\n"
          f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
    {"role": "user", "content": prompt},
]
response = client.chat.completions.create(
    model="meta/llama3-8b-instruct",
    messages=messages,
    extra_body={"nvext": {"guided_regex": regex}},
    stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# 4

Choices#

You can specify a list of choices for the output using the guided_choice parameter in the nvext extension to the OpenAI schema.

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
choices = ["Good", "Bad", "Neutral"]
prompt = (f"Return the sentiment based on the following movie review. It should be one of {choices}\n"
          f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
    {"role": "user", "content": prompt},
]
response = client.chat.completions.create(
    model="meta/llama3-8b-instruct",
    messages=messages,
    extra_body={"nvext": {"guided_choice": choices}},
    stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# Good

Context-free Grammar#

You can specify a context-free grammar in the EBNF format using the guided_grammar parameter in the nvext extension to the OpenAI schema.

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
grammar = """
    ?start: "The movie name is rated " rating " stars."

    ?rating: /[1-5]/
"""

prompt = (f"Summarize the following movie review:\n"
          f"Review: This movie exceeds expectations. I rate it four stars out of five.")
messages = [
    {"role": "user", "content": prompt},
]
response = client.chat.completions.create(
    model="meta/llama3-8b-instruct",
    messages=messages,
    extra_body={"nvext": {"guided_grammar": grammar}},
    stream=False
)
completion = response.choices[0].message.content
print(completion)
# Prints:
# The movie name is rated 4 stars.