Structured Outputs#

Data Designer provides powerful capabilities for generating structured data with user-defined schemas. This guide explains how to use structured outputs in your data generation workflows.

What Are Structured Outputs?#

Structured outputs allow you to generate data with specific formats, schemas, and nested relationships. Instead of generating free-form text, you can generate dataset objects that conform to a specific schema.

Use cases include:

  • Complex nested records (e.g., orders with line items)

  • Nested arrays and objects (e.g., lists of products)

  • Structured conversation data (e.g., chat logs)

Defining Data Models with Pydantic#

The most common way to define structured outputs is using Pydantic models. For example, you can define an order with a list of products as follows:

from pydantic import BaseModel, Field

# Define a simple product model
class Product(BaseModel):
    name: str = Field(..., description="Name of the product")
    price: float = Field(..., description="Price in USD")
    category: str = Field(..., description="Product category")
    in_stock: bool = Field(..., description="Whether the product is in stock")

# Define an order with a list of products
class Order(BaseModel):
    order_id: str = Field(..., description="Unique order identifier")
    customer_name: str = Field(..., description="Name of the customer")
    order_date: str = Field(..., description="Date the order was placed")
    total_amount: float = Field(..., description="Total order amount")
    products: list[Product] = Field(..., description="List of products in the order")
    shipping_address: dict = Field(..., description="Shipping address")

Using Structured Outputs in Data Designer#

Before getting started, ensure you have the Data Designer client and configuration builder set up:

import os
from nemo_microservices import NeMoMicroservices
from nemo_microservices.beta.data_designer import DataDesignerClient, DataDesignerConfigBuilder
from nemo_microservices.beta.data_designer.config import columns as C
from nemo_microservices.beta.data_designer.config import params as P

data_designer_client = DataDesignerClient(
    client=NeMoMicroservices(base_url=os.environ["NEMO_MICROSERVICES_BASE_URL"])
)

config_builder = DataDesignerConfigBuilder(model_configs="path/to/your/model_configs.yaml")

Adding a Structured Column#

User the LLMStructuredColumn class to add an LLM-generated structured column.

# Add a sampler column for customer information
config_builder.add_column(
    C.SamplerColumn(
        name="customer_city",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=["New York", "Los Angeles", "Chicago", "Houston"]
        )
    )
)

# Add structured order data column
config_builder.add_column(
    C.LLMStructuredColumn(
        name="order_data",
        prompt=(
            "Generate a realistic order for a customer from {{customer_city}}.",
            "Include between 1 and 5 products in the order.",
        ),
        output_format=Order,
        model_alias="structured"
    )
)

For end-to-end examples, we recommend following along with the Tutorial Notebooks.