Structured Outputs#
Data Designer provides powerful capabilities for generating structured data with user-defined schemas. This guide explains how to use structured outputs in your data generation workflows.
What Are Structured Outputs?#
Structured outputs allow you to generate data with specific formats, schemas, and nested relationships. Instead of generating free-form text, you can generate dataset objects that conform to a specific schema.
Use cases include:
Complex nested records (e.g., orders with line items)
Nested arrays and objects (e.g., lists of products)
Structured conversation data (e.g., chat logs)
Defining Data Models with Pydantic#
The most common way to define structured outputs is using Pydantic models. For example, you can define an order with a list of products as follows:
from pydantic import BaseModel, Field
# Define a simple product model
class Product(BaseModel):
name: str = Field(..., description="Name of the product")
price: float = Field(..., description="Price in USD")
category: str = Field(..., description="Product category")
in_stock: bool = Field(..., description="Whether the product is in stock")
# Define an order with a list of products
class Order(BaseModel):
order_id: str = Field(..., description="Unique order identifier")
customer_name: str = Field(..., description="Name of the customer")
order_date: str = Field(..., description="Date the order was placed")
total_amount: float = Field(..., description="Total order amount")
products: list[Product] = Field(..., description="List of products in the order")
shipping_address: dict = Field(..., description="Shipping address")
Using Structured Outputs in Data Designer#
Before getting started, ensure you have the Data Designer client and configuration builder set up:
import os
from nemo_microservices import NeMoMicroservices
from nemo_microservices.beta.data_designer import DataDesignerClient, DataDesignerConfigBuilder
from nemo_microservices.beta.data_designer.config import columns as C
from nemo_microservices.beta.data_designer.config import params as P
data_designer_client = DataDesignerClient(
client=NeMoMicroservices(base_url=os.environ["NEMO_MICROSERVICES_BASE_URL"])
)
config_builder = DataDesignerConfigBuilder(model_configs="path/to/your/model_configs.yaml")
Adding a Structured Column#
User the LLMStructuredColumn
class to add an LLM-generated structured column.
# Add a sampler column for customer information
config_builder.add_column(
C.SamplerColumn(
name="customer_city",
type=P.SamplerType.CATEGORY,
params=P.CategorySamplerParams(
values=["New York", "Los Angeles", "Chicago", "Houston"]
)
)
)
# Add structured order data column
config_builder.add_column(
C.LLMStructuredColumn(
name="order_data",
prompt=(
"Generate a realistic order for a customer from {{customer_city}}.",
"Include between 1 and 5 products in the order.",
),
output_format=Order,
model_alias="structured"
)
)
For end-to-end examples, we recommend following along with the Tutorial Notebooks.