Large Language Models (Latest)
Large Language Models (Latest)

Structured Generation

NIM LLM supports getting structured outputs by specifying a JSON schema, regular expression, context free grammar, or constraining the output to some particular choices. This can be useful where NIM is part of a larger pipeline and the LLM outputs are expected to be in a certain format. Below are some examples of how the outputs can be constrained in different ways.

You can constrain the output to follow a particular JSON schema by using the guided_json parameter in the nvext extension to the OpenAI schema.

Important

We recommend you specify a JSON schema like above using the guided_json parameter instead of setting response_format={"type": "json_object"} as setting the latter can produce empty outputs.

Copy
Copied!
            

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used") json_schema = { "type": "object", "properties": { "title": { "type": "string" }, "rating": { "type": "number" } }, "required": [ "title", "rating" ] } prompt = (f"Return the title and the rating based on the following movie review according to this JSON schema:{str(json_schema)}.\n" f"Review: Inception is a really well made film. I rate it four stars out of five.") messages = [ {"role": "user", "content": prompt}, ] response = client.chat.completions.create( model="meta/llama3-8b-instruct", messages=messages, extra_body={"nvext": {"guided_json": json_schema}}, stream=False ) assistant_message = response.choices[0].message.content print(assistant_message) # Prints: # {"title":"Inception", "rating":4.0}

You can specify a regular expression for the output format using the guided_regex parameter in the nvext extension to the OpenAI schema.

Copy
Copied!
            

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used") regex = "[1-5]" prompt = (f"Return just the rating based on the following movie review\n" f"Review: This movie exceeds expectations. I rate it four stars out of five.") messages = [ {"role": "user", "content": prompt}, ] response = client.chat.completions.create( model="meta/llama3-8b-instruct", messages=messages, extra_body={"nvext": {"guided_regex": regex}}, stream=False ) assistant_message = response.choices[0].message.content print(assistant_message) # Prints: # 4

You can specify a list of choices for the output using the guided_choice parameter in the nvext extension to the OpenAI schema.

Copy
Copied!
            

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used") choices = ["Good", "Bad", "Neutral"] prompt = (f"Return the sentiment based on the following movie review. It should be one of{choices}\n" f"Review: This movie exceeds expectations. I rate it four stars out of five.") messages = [ {"role": "user", "content": prompt}, ] response = client.chat.completions.create( model="meta/llama3-8b-instruct", messages=messages, extra_body={"nvext": {"guided_choice": choices}}, stream=False ) assistant_message = response.choices[0].message.content print(assistant_message) # Prints: # Good

You can specify a context-free grammar in the EBNF format using the guided_grammar parameter in the nvext extension to the OpenAI schema.

Copy
Copied!
            

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used") grammar = """ ?start: "The movie name is rated " rating " stars." ?rating: /[1-5]/ """ prompt = (f"Summarize the following movie review:\n" f"Review: This movie exceeds expectations. I rate it four stars out of five.") messages = [ {"role": "user", "content": prompt}, ] response = client.chat.completions.create( model="meta/llama3-8b-instruct", messages=messages, extra_body={"nvext": {"guided_grammar": grammar}}, stream=False ) completion = response.choices[0].text print(completion) # Prints: # The movie name is rated 4 stars.

Previous Optimization
Next Parameter-Efficient Fine-Tuning
© Copyright © 2024, NVIDIA Corporation. Last updated on Jul 26, 2024.