Structured Outputs, Jinja Expressions, and Conditional Generation
🎨 Data Designer Tutorial: Structured Outputs, Jinja Expressions, and Conditional Generation
📚 What you'll learn
In this notebook, we will continue our exploration of Data Designer, demonstrating more advanced data generation using structured outputs, Jinja expressions, and conditional generation with skip.when.
If this is your first time using Data Designer, we recommend starting with the first notebook in this tutorial series.
📦 Import Data Designer
-
data_designer.configprovides access to the configuration API. -
DataDesigneris the main interface for data generation.
⚙️ Initialize the Data Designer interface
-
DataDesigneris the main object that is used to interface with the library. -
When initialized without arguments, the default model providers are used.
🎛️ Define model configurations
-
Each
ModelConfigdefines a model that can be used during the generation process. -
The "model alias" is used to reference the model in the Data Designer config (as we will see below).
-
The "model provider" is the external service that hosts the model (see the model config docs for more details).
-
By default, we use build.nvidia.com as the model provider.
🏗️ Initialize the Data Designer Config Builder
-
The Data Designer config defines the dataset schema and generation process.
-
The config builder provides an intuitive interface for building this configuration.
-
The list of model configs is provided to the builder at initialization.
🧑🎨 Designing our data
-
We will again create a product review dataset, but this time we will use structured outputs and Jinja expressions.
-
Structured outputs let you specify the exact schema of the data you want to generate.
-
Data Designer supports schemas specified using either json schema or Pydantic data models (recommended).
We'll define our structured outputs using Pydantic data models
💡 Why Pydantic?
Pydantic models provide better IDE support and type validation.
They are more Pythonic than raw JSON schemas.
They integrate seamlessly with Data Designer's structured output system.
Next, let's design our product review dataset using a few more tricks compared to the previous notebook.
[13:23:15] [INFO] ✅ Validation passed
Next, we will use more advanced Jinja expressions to create new columns.
Jinja expressions let you:
-
Access nested attributes:
{{ customer.first_name }} -
Combine values:
{{ customer.first_name }} {{ customer.last_name }} -
Use conditional logic:
{% if condition %}...{% endif %}
[13:23:15] [INFO] ✅ Validation passed
🚦 Conditional generation with skip.when
So far, every column is generated for every row. But sometimes an expensive LLM column only makes sense for a subset of rows — for example, a detailed complaint analysis is only useful when the review is negative.
Data Designer lets you skip column generation on a per-row basis using SkipConfig.
Skipped rows receive None by default, but you can provide a sentinel value with
skip=dd.SkipConfig(when="...", value="N/A") to write a specific value instead.
There are three patterns to know:
| Pattern | How | Effect |
|---|---|---|
| Expression gate | skip=dd.SkipConfig(when="...") |
Skip this column when the Jinja2 expression is truthy |
| Skip propagation (default) | Downstream column depends on a skipped column | Automatically skipped too (propagate_skip=True by default) |
| Propagation opt-out | propagate_skip=False on the downstream column |
Always generates, even if an upstream was skipped |
Pattern 1 — Expression gate. Only generate a detailed complaint analysis when the customer gave a low rating (1 or 2 stars).
Rows where the rating is 3 or higher will get None for this column.
DataDesignerConfigBuilder( sampler_columns: [ "customer", "product_category", "product_subcategory", "target_age_range", "review_style" ] llm_text_columns: ['complaint_analysis'] llm_structured_columns: ['product', 'customer_review'] expression_columns: ['customer_name', 'customer_age'] )
Pattern 2 — Skip propagation. action_items depends on complaint_analysis.
When complaint_analysis is skipped, action_items auto-skips too because
propagate_skip defaults to True.
DataDesignerConfigBuilder( sampler_columns: [ "customer", "product_category", "product_subcategory", "target_age_range", "review_style" ] llm_text_columns: ['complaint_analysis', 'action_items'] llm_structured_columns: ['product', 'customer_review'] expression_columns: ['customer_name', 'customer_age'] )
Pattern 3 — Propagation opt-out. review_summary also depends on complaint_analysis,
but sets propagate_skip=False so it always generates. The prompt uses a Jinja conditional
to handle the case where complaint_analysis is None.
[13:23:15] [INFO] ✅ Validation passed
🔁 Iteration is key – preview the dataset!
-
Use the
previewmethod to generate a sample of records quickly. -
Inspect the results for quality and format issues.
-
Adjust column configurations, prompts, or parameters as needed.
-
Re-run the preview until satisfied.
[13:23:15] [INFO] 🔭 Preview generation in progress
[13:23:15] [INFO] |-- 🔒 Jinja rendering engine: secure
[13:23:15] [INFO] ✅ Validation passed
[13:23:15] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph
[13:23:15] [INFO] 🩺 Running health checks for models...
[13:23:15] [INFO] |-- 👀 Checking 'nvidia/nemotron-3-nano-30b-a3b' in provider named 'nvidia' for model alias 'nemotron-nano-v3'...
[13:23:16] [INFO] |-- ✅ Passed!
[13:23:16] [INFO] ⚡ DATA_DESIGNER_ASYNC_ENGINE is enabled - using async task-queue preview
[13:23:16] [INFO] 🗂️ llm-structured model config for column 'product'
[13:23:16] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[13:23:16] [INFO] |-- model alias: 'nemotron-nano-v3'
[13:23:16] [INFO] |-- model provider: 'nvidia'
[13:23:16] [INFO] |-- inference parameters:
[13:23:16] [INFO] | |-- generation_type=chat-completion
[13:23:16] [INFO] | |-- max_parallel_requests=4
[13:23:16] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[13:23:16] [INFO] | |-- temperature=1.00
[13:23:16] [INFO] | |-- top_p=1.00
[13:23:16] [INFO] | |-- max_tokens=2048
[13:23:16] [INFO] 🗂️ llm-structured model config for column 'customer_review'
[13:23:16] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[13:23:16] [INFO] |-- model alias: 'nemotron-nano-v3'
[13:23:16] [INFO] |-- model provider: 'nvidia'
[13:23:16] [INFO] |-- inference parameters:
[13:23:16] [INFO] | |-- generation_type=chat-completion
[13:23:16] [INFO] | |-- max_parallel_requests=4
[13:23:16] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[13:23:16] [INFO] | |-- temperature=1.00
[13:23:16] [INFO] | |-- top_p=1.00
[13:23:16] [INFO] | |-- max_tokens=2048
[13:23:16] [INFO] 📝 llm-text model config for column 'complaint_analysis'
[13:23:16] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[13:23:16] [INFO] |-- model alias: 'nemotron-nano-v3'
[13:23:16] [INFO] |-- model provider: 'nvidia'
[13:23:16] [INFO] |-- inference parameters:
[13:23:16] [INFO] | |-- generation_type=chat-completion
[13:23:16] [INFO] | |-- max_parallel_requests=4
[13:23:16] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[13:23:16] [INFO] | |-- temperature=1.00
[13:23:16] [INFO] | |-- top_p=1.00
[13:23:16] [INFO] | |-- max_tokens=2048
[13:23:16] [INFO] 📝 llm-text model config for column 'action_items'
[13:23:16] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[13:23:16] [INFO] |-- model alias: 'nemotron-nano-v3'
[13:23:16] [INFO] |-- model provider: 'nvidia'
[13:23:16] [INFO] |-- inference parameters:
[13:23:16] [INFO] | |-- generation_type=chat-completion
[13:23:16] [INFO] | |-- max_parallel_requests=4
[13:23:16] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[13:23:16] [INFO] | |-- temperature=1.00
[13:23:16] [INFO] | |-- top_p=1.00
[13:23:16] [INFO] | |-- max_tokens=2048
[13:23:16] [INFO] 📝 llm-text model config for column 'review_summary'
[13:23:16] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[13:23:16] [INFO] |-- model alias: 'nemotron-nano-v3'
[13:23:16] [INFO] |-- model provider: 'nvidia'
[13:23:16] [INFO] |-- inference parameters:
[13:23:16] [INFO] | |-- generation_type=chat-completion
[13:23:16] [INFO] | |-- max_parallel_requests=4
[13:23:16] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[13:23:16] [INFO] | |-- temperature=1.00
[13:23:16] [INFO] | |-- top_p=1.00
[13:23:16] [INFO] | |-- max_tokens=2048
[13:23:16] [INFO] ⚡️ Async generation: 5 column(s) (product, customer_review, complaint_analysis, action_items, review_summary), 10 tasks across 1 row group(s)
[13:23:16] [INFO] 🚀 (1/1) Dispatching with 2 records
[13:23:16] [INFO] 🎲 (1/1) Preparing samplers to generate 2 records across 5 columns
[13:23:16] [INFO] 🧩 (1/1) Generating column `customer_age` from expression
[13:23:16] [INFO] 🧩 (1/1) Generating column `customer_name` from expression
[13:23:20] [INFO] 📊 Progress [4.2s]:
[13:23:20] [INFO] |-- 🌕 product: 2/2 (100%) 0.5 rec/s
[13:23:20] [INFO] |-- 🤩 customer_review: 2/2 (100%) 0.5 rec/s
[13:23:20] [INFO] |-- 🦁 complaint_analysis: 2/2 (100%) 0.5 rec/s, 2 skipped
[13:23:20] [INFO] |-- 🤩 action_items: 2/2 (100%) 0.5 rec/s, 2 skipped
[13:23:20] [INFO] |-- 🐔 review_summary: 2/2 (100%) 0.5 rec/s
[13:23:20] [INFO] ✅ Async generation complete [4.2s]: 6 ok, 0 failed, 4 skipped across 5 column(s)
[13:23:20] [INFO] 📊 Model usage summary:
[13:23:20] [INFO] |-- model: nvidia/nemotron-3-nano-30b-a3b
[13:23:20] [INFO] |-- tokens: input=1991, output=973, total=2964, tps=702
[13:23:20] [INFO] |-- requests: success=6, failed=0, total=6, rpm=85
[13:23:20] [INFO] 🙈 Dropping columns: ['customer']
[13:23:20] [INFO] 📐 Measuring dataset column statistics:
[13:23:20] [INFO] |-- 🎲 column: 'product_category'
[13:23:20] [INFO] |-- 🎲 column: 'product_subcategory'
[13:23:20] [INFO] |-- 🎲 column: 'target_age_range'
[13:23:20] [INFO] |-- 🎲 column: 'review_style'
[13:23:20] [INFO] |-- 🧩 column: 'customer_name'
[13:23:20] [INFO] |-- 🧩 column: 'customer_age'
[13:23:20] [INFO] |-- 🗂️ column: 'product'
[13:23:20] [INFO] |-- 🗂️ column: 'customer_review'
[13:23:20] [INFO] |-- 📝 column: 'complaint_analysis'
[13:23:20] [INFO] |-- 📝 column: 'action_items'
[13:23:20] [INFO] |-- 📝 column: 'review_summary'
[13:23:20] [INFO] ✅ Preview complete!
Generated Columns ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Name ┃ Value ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ product_category │ Home Office │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ product_subcategory │ Chairs │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ target_age_range │ 18-25 │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ review_style │ rambling │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ complaint_analysis │ None │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ action_items │ None │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ review_summary │ The ErgoFlex Mesh Gaming Chair delivers a breathable, supportive mesh seat and │ │ │ responsive lumbar and armrests that make long gaming or streaming sessions │ │ │ comfortable, offering excellent ergonomic support at a budget‑friendly price point. │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ product │ { │ │ │ 'name': 'ErgoFlex Mesh Gaming Chair', │ │ │ 'description': 'An ergonomic mesh gaming chair featuring lumbar support and │ │ │ adjustable armrests, designed for comfortable long‑hour work sessions.', │ │ │ 'price': 129.99 │ │ │ } │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ customer_review │ { │ │ │ 'rating': 5, │ │ │ 'customer_mood': 'happy', │ │ │ 'review': "Alright folks, let's talk about this ErgoFlex Mesh Gaming Chair — │ │ │ yeah, the thing that's basically a throne for folks like us who spend more time │ │ │ typing than standing. I'm Matthew from Josephburgh, Texas, and I’ve been kicking │ │ │ around this chair for a solid week now. Picture this: you sit down, the mesh feels │ │ │ like a gentle breeze, and suddenly you realize you haven’t shifted your back in like │ │ │ ten minutes — because it’s actually supporting you. The lumbar support? It’s like a │ │ │ personal assistant whispering ‘hey, keep that posture intact,’ and you barely notice │ │ │ it, which means it’s doing its job right. The armrests slide around like my favorite │ │ │ fishing reel, letting me tweak them whenever I need to line up a shot or just rest │ │ │ my arms while I’m strategizing my next move on the battlefield of spreadsheets. │ │ │ \n\n Honestly, I was a bit skeptical about another mesh chair making the same old │ │ │ claims, but the comfort level is off the charts. You can sit through a marathon of │ │ │ Netflix binge‑watching or a marathon of gaming without feeling like your hips are │ │ │ staging a protest. The adjustability feels like you’re customizing a pair of boots — │ │ │ snug, but never tight. And at $129.99, it’s not a bank‑breaker, it’s more like a │ │ │ modest investment in a smoother, less achy day. So yeah, I’m happy. If you’re │ │ │ looking for a chair that doesn’t make you feel like you’re sitting on a slab of │ │ │ concrete, give this one a whirl. You’ll thank yourself later when you’re not nursing │ │ │ sore shoulders or a sore back. \n\n Final thoughts: the chair is comfy, the mesh │ │ │ breathes, the lumbar support is a silent hero, and the armrests are basically your │ │ │ new best friend. It’s the kind of seating that makes you wonder why you ever settled │ │ │ for less. Give it a go, and let me know how your own marathon sessions feel │ │ │ afterward!" │ │ │ } │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ customer_name │ Matthew Hansen │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ customer_age │ 61 │ └─────────────────────┴──────────────────────────────────────────────────────────────────────────────────────┘
| product_category | product_subcategory | target_age_range | review_style | customer_name | customer_age | product | customer_review | complaint_analysis | action_items | review_summary | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Home Office | Chairs | 18-25 | rambling | Matthew Hansen | 61 | {'name': 'ErgoFlex Mesh Gaming Chair', 'descri... | {'rating': 5, 'customer_mood': 'happy', 'revie... | None | None | The ErgoFlex Mesh Gaming Chair delivers a brea... |
| 1 | Home & Kitchen | Decor | 35-50 | rambling | Chad Rodriguez | 58 | {'name': 'City Skyline Graphic Mug', 'descript... | {'rating': 5, 'customer_mood': 'happy', 'revie... | None | None | This 5-star review praises a City Skyline Grap... |
📊 Analyze the generated data
-
Data Designer automatically generates a basic statistical analysis of the generated data.
-
This analysis is available via the
analysisproperty of generation result objects.
──────────────────────────────────────── 🎨 Data Designer Dataset Profile ───────────────────────────────────────── Dataset Overview ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ number of records ┃ number of columns ┃ percent complete records ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ 2 │ 11 │ 100.0% │ └─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘ 🎲 Sampler Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ column name ┃ data type ┃ number unique values ┃ sampler type ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ product_category │ string │ 2 (100.0%) │ category │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ product_subcategory │ string │ 2 (100.0%) │ subcategory │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ target_age_range │ string │ 2 (100.0%) │ category │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ review_style │ string │ 1 (50.0%) │ category │ └──────────────────────────────────┴──────────────────┴────────────────────────────────────┴──────────────────────┘ 📝 LLM-Text Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ ┃ prompt tokens ┃ completion tokens ┃ ┃ column name ┃ data type ┃ number unique values ┃ per record ┃ per record ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ complaint_analysis │ None │ 0 (0.0%) │ 364.0 +/- 98.0 │ 1.0 +/- 0.0 │ ├─────────────────────────┼──────────────┼────────────────────────────┼────────────────────┼──────────────────────┤ │ action_items │ None │ 0 (0.0%) │ 22.0 +/- 0.0 │ 1.0 +/- 0.0 │ ├─────────────────────────┼──────────────┼────────────────────────────┼────────────────────┼──────────────────────┤ │ review_summary │ string │ 2 (100.0%) │ 336.0 +/- 98.0 │ 56.5 +/- 19.1 │ └─────────────────────────┴──────────────┴────────────────────────────┴────────────────────┴──────────────────────┘ 🗂️ LLM-Structured Columns ┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ ┃ prompt tokens ┃ completion tokens ┃ ┃ column name ┃ data type ┃ number unique values ┃ per record ┃ per record ┃ ┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩ │ product │ dict │ 2 (100.0%) │ 265.0 +/- 0.0 │ 61.0 +/- 18.4 │ ├──────────────────────┼───────────────┼────────────────────────────┼─────────────────────┼───────────────────────┤ │ customer_review │ dict │ 2 (100.0%) │ 315.5 +/- 13.5 │ 329.5 +/- 142.1 │ └──────────────────────┴───────────────┴────────────────────────────┴─────────────────────┴───────────────────────┘ 🧩 Expression Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ column name ┃ data type ┃ number unique values ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ customer_name │ string │ 2 (100.0%) │ ├───────────────────────────────────┼──────────────────────────┼──────────────────────────────────────────────────┤ │ customer_age │ string │ 2 (100.0%) │ └───────────────────────────────────┴──────────────────────────┴──────────────────────────────────────────────────┘ ╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮ │ │ │ 1. All token statistics are based on a sample of max(1000, len(dataset)) records. │ │ 2. Tokens are calculated using tiktoken's cl100k_base tokenizer. │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
🆙 Scale up!
-
Happy with your preview data?
-
Use the
createmethod to submit larger Data Designer generation jobs.
[13:23:20] [INFO] 🎨 Creating Data Designer dataset
[13:23:20] [INFO] |-- 🔒 Jinja rendering engine: secure
[13:23:20] [INFO] ✅ Validation passed
[13:23:20] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph
[13:23:20] [INFO] 🩺 Running health checks for models...
[13:23:20] [INFO] |-- 👀 Checking 'nvidia/nemotron-3-nano-30b-a3b' in provider named 'nvidia' for model alias 'nemotron-nano-v3'...
[13:23:21] [INFO] |-- ✅ Passed!
[13:23:21] [INFO] ⚡ DATA_DESIGNER_ASYNC_ENGINE is enabled - using async task-queue builder
[13:23:21] [INFO] 🗂️ llm-structured model config for column 'product'
[13:23:21] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[13:23:21] [INFO] |-- model alias: 'nemotron-nano-v3'
[13:23:21] [INFO] |-- model provider: 'nvidia'
[13:23:21] [INFO] |-- inference parameters:
[13:23:21] [INFO] | |-- generation_type=chat-completion
[13:23:21] [INFO] | |-- max_parallel_requests=4
[13:23:21] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[13:23:21] [INFO] | |-- temperature=1.00
[13:23:21] [INFO] | |-- top_p=1.00
[13:23:21] [INFO] | |-- max_tokens=2048
[13:23:21] [INFO] 🗂️ llm-structured model config for column 'customer_review'
[13:23:21] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[13:23:21] [INFO] |-- model alias: 'nemotron-nano-v3'
[13:23:21] [INFO] |-- model provider: 'nvidia'
[13:23:21] [INFO] |-- inference parameters:
[13:23:21] [INFO] | |-- generation_type=chat-completion
[13:23:21] [INFO] | |-- max_parallel_requests=4
[13:23:21] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[13:23:21] [INFO] | |-- temperature=1.00
[13:23:21] [INFO] | |-- top_p=1.00
[13:23:21] [INFO] | |-- max_tokens=2048
[13:23:21] [INFO] 📝 llm-text model config for column 'complaint_analysis'
[13:23:21] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[13:23:21] [INFO] |-- model alias: 'nemotron-nano-v3'
[13:23:21] [INFO] |-- model provider: 'nvidia'
[13:23:21] [INFO] |-- inference parameters:
[13:23:21] [INFO] | |-- generation_type=chat-completion
[13:23:21] [INFO] | |-- max_parallel_requests=4
[13:23:21] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[13:23:21] [INFO] | |-- temperature=1.00
[13:23:21] [INFO] | |-- top_p=1.00
[13:23:21] [INFO] | |-- max_tokens=2048
[13:23:21] [INFO] 📝 llm-text model config for column 'action_items'
[13:23:21] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[13:23:21] [INFO] |-- model alias: 'nemotron-nano-v3'
[13:23:21] [INFO] |-- model provider: 'nvidia'
[13:23:21] [INFO] |-- inference parameters:
[13:23:21] [INFO] | |-- generation_type=chat-completion
[13:23:21] [INFO] | |-- max_parallel_requests=4
[13:23:21] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[13:23:21] [INFO] | |-- temperature=1.00
[13:23:21] [INFO] | |-- top_p=1.00
[13:23:21] [INFO] | |-- max_tokens=2048
[13:23:21] [INFO] 📝 llm-text model config for column 'review_summary'
[13:23:21] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[13:23:21] [INFO] |-- model alias: 'nemotron-nano-v3'
[13:23:21] [INFO] |-- model provider: 'nvidia'
[13:23:21] [INFO] |-- inference parameters:
[13:23:21] [INFO] | |-- generation_type=chat-completion
[13:23:21] [INFO] | |-- max_parallel_requests=4
[13:23:21] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[13:23:21] [INFO] | |-- temperature=1.00
[13:23:21] [INFO] | |-- top_p=1.00
[13:23:21] [INFO] | |-- max_tokens=2048
[13:23:21] [INFO] ⚡️ Async generation: 5 column(s) (product, customer_review, complaint_analysis, action_items, review_summary), 50 tasks across 1 row group(s)
[13:23:21] [INFO] 🚀 (1/1) Dispatching with 10 records
[13:23:21] [INFO] 🎲 (1/1) Preparing samplers to generate 10 records across 5 columns
[13:23:21] [INFO] 🧩 (1/1) Generating column `customer_age` from expression
[13:23:21] [INFO] 🧩 (1/1) Generating column `customer_name` from expression
[13:23:26] [INFO] 📊 Progress [5.2s]:
[13:23:26] [INFO] |-- 🌗 product: 7/10 (70%) 1.4 rec/s
[13:23:26] [INFO] |-- 🌦️ customer_review: 3/10 (30%) 0.6 rec/s
[13:23:26] [INFO] |-- 🌦️ complaint_analysis: 3/10 (30%) 0.6 rec/s, 3 skipped
[13:23:26] [INFO] |-- 🐴 action_items: 3/10 (30%) 0.6 rec/s, 3 skipped
[13:23:26] [INFO] |-- 🌦️ review_summary: 3/10 (30%) 0.6 rec/s
[13:23:31] [INFO] 📊 Progress [10.7s]:
[13:23:31] [INFO] |-- 🌕 product: 10/10 (100%) 0.9 rec/s
[13:23:31] [INFO] |-- ☀️ customer_review: 10/10 (100%) 0.9 rec/s
[13:23:31] [INFO] |-- ☀️ complaint_analysis: 10/10 (100%) 0.9 rec/s, 10 skipped
[13:23:31] [INFO] |-- 🚀 action_items: 10/10 (100%) 0.9 rec/s, 10 skipped
[13:23:31] [INFO] |-- ☀️ review_summary: 10/10 (100%) 0.9 rec/s
[13:23:31] [INFO] 🙈 Dropping columns: ['customer']
[13:23:31] [INFO] ✅ Async generation complete [10.7s]: 30 ok, 0 failed, 20 skipped across 5 column(s)
[13:23:31] [INFO] 📊 Model usage summary:
[13:23:31] [INFO] |-- model: nvidia/nemotron-3-nano-30b-a3b
[13:23:31] [INFO] |-- tokens: input=8856, output=3656, total=12512, tps=1144
[13:23:31] [INFO] |-- requests: success=31, failed=0, total=31, rpm=170
[13:23:31] [INFO] 📐 Measuring dataset column statistics:
[13:23:31] [INFO] |-- 🎲 column: 'product_category'
[13:23:31] [INFO] |-- 🎲 column: 'product_subcategory'
[13:23:31] [INFO] |-- 🎲 column: 'target_age_range'
[13:23:31] [INFO] |-- 🎲 column: 'review_style'
[13:23:31] [INFO] |-- 🧩 column: 'customer_name'
[13:23:31] [INFO] |-- 🧩 column: 'customer_age'
[13:23:31] [INFO] |-- 🗂️ column: 'product'
[13:23:31] [INFO] |-- 🗂️ column: 'customer_review'
[13:23:32] [INFO] |-- 📝 column: 'complaint_analysis'
[13:23:32] [INFO] |-- 📝 column: 'action_items'
[13:23:32] [INFO] |-- 📝 column: 'review_summary'
| product_category | product_subcategory | target_age_range | review_style | customer_age | customer_name | product | customer_review | complaint_analysis | action_items | review_summary | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Home & Kitchen | Organization | 65+ | detailed | 23 | Amy Hogan | {'description': "Une plateforme aimantée avec ... | {'customer_mood': 'happy', 'rating': 4, 'revie... | None | None | A magnetic, stainless‑steel anti‑degradation t... |
| 1 | Electronics | Headphones | 65+ | brief | 80 | Marcus Burton | {'description': "Ergonomic over-ear headphones... | {'customer_mood': 'happy', 'rating': 5, 'revie... | None | None | The SoundSerenity Over‑Ear Headphones earn a 5... |
| 2 | Electronics | Headphones | 50-65 | detailed | 60 | Gloria Case | {'description': 'High-fidelity over-ear headph... | {'customer_mood': 'happy', 'rating': 5, 'revie... | None | None | A senior‑friendly, noise‑cancelling headphone ... |
| 3 | Books | Self-Help | 25-35 | structured with bullet points | 52 | Tracie Lewis | {'description': 'A 120-page guided journal des... | {'customer_mood': 'happy', 'rating': 5, 'revie... | None | None | Mindful Momentum earns a 5-star rating for its... |
| 4 | Home & Kitchen | Decor | 18-25 | rambling | 95 | Brittany Oneal | {'description': 'A vibrant collection of neon-... | {'customer_mood': 'happy', 'rating': 5, 'revie... | None | None | A vibrant, matte‑finished Neon Geometric Wall ... |
──────────────────────────────────────── 🎨 Data Designer Dataset Profile ───────────────────────────────────────── Dataset Overview ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ number of records ┃ number of columns ┃ percent complete records ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ 10 │ 11 │ 100.0% │ └─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘ 🎲 Sampler Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ column name ┃ data type ┃ number unique values ┃ sampler type ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ product_category │ string │ 4 (40.0%) │ category │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ product_subcategory │ string │ 9 (90.0%) │ subcategory │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ target_age_range │ string │ 5 (50.0%) │ category │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ review_style │ string │ 4 (40.0%) │ category │ └──────────────────────────────────┴──────────────────┴────────────────────────────────────┴──────────────────────┘ 📝 LLM-Text Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ ┃ prompt tokens ┃ completion tokens ┃ ┃ column name ┃ data type ┃ number unique values ┃ per record ┃ per record ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ complaint_analysis │ None │ 0 (0.0%) │ 187.0 +/- 66.2 │ 1.0 +/- 0.0 │ ├─────────────────────────┼──────────────┼────────────────────────────┼────────────────────┼──────────────────────┤ │ action_items │ None │ 0 (0.0%) │ 22.0 +/- 0.0 │ 1.0 +/- 0.0 │ ├─────────────────────────┼──────────────┼────────────────────────────┼────────────────────┼──────────────────────┤ │ review_summary │ string │ 10 (100.0%) │ 158.5 +/- 66.1 │ 52.5 +/- 8.3 │ └─────────────────────────┴──────────────┴────────────────────────────┴────────────────────┴──────────────────────┘ 🗂️ LLM-Structured Columns ┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ ┃ prompt tokens ┃ completion tokens ┃ ┃ column name ┃ data type ┃ number unique values ┃ per record ┃ per record ┃ ┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩ │ product │ dict │ 10 (100.0%) │ 265.0 +/- 0.6 │ 89.5 +/- 18.8 │ ├──────────────────────┼───────────────┼────────────────────────────┼─────────────────────┼───────────────────────┤ │ customer_review │ dict │ 10 (100.0%) │ 344.0 +/- 17.8 │ 142.5 +/- 68.3 │ └──────────────────────┴───────────────┴────────────────────────────┴─────────────────────┴───────────────────────┘ 🧩 Expression Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ column name ┃ data type ┃ number unique values ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ customer_name │ string │ 10 (100.0%) │ ├───────────────────────────────────┼──────────────────────────┼──────────────────────────────────────────────────┤ │ customer_age │ string │ 10 (100.0%) │ └───────────────────────────────────┴──────────────────────────┴──────────────────────────────────────────────────┘ ╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮ │ │ │ 1. All token statistics are based on a sample of max(1000, len(dataset)) records. │ │ 2. Tokens are calculated using tiktoken's cl100k_base tokenizer. │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
⏭️ Next Steps
Check out the following notebook to learn more about: