In this notebook, we will continue our exploration of Data Designer, demonstrating more advanced data generation using structured outputs, Jinja expressions, and conditional generation with skip.when.
If this is your first time using Data Designer, we recommend starting with the first notebook in this tutorial series.
data_designer.config provides access to the configuration API.
DataDesigner is the main interface for data generation.
DataDesigner is the main object that is used to interface with the library.
When initialized without arguments, the default model providers are used.
Each ModelConfig defines a model that can be used during the generation process.
The "model alias" is used to reference the model in the Data Designer config (as we will see below).
The "model provider" is the external service that hosts the model (see the model config docs for more details).
By default, we use build.nvidia.com as the model provider.
The Data Designer config defines the dataset schema and generation process.
The config builder provides an intuitive interface for building this configuration.
The list of model configs is provided to the builder at initialization.
We will again create a product review dataset, but this time we will use structured outputs and Jinja expressions.
Structured outputs let you specify the exact schema of the data you want to generate.
Data Designer supports schemas specified using either json schema or Pydantic data models (recommended).
We'll define our structured outputs using Pydantic data models
💡 Why Pydantic?
Pydantic models provide better IDE support and type validation.
They are more Pythonic than raw JSON schemas.
They integrate seamlessly with Data Designer's structured output system.
Next, let's design our product review dataset using a few more tricks compared to the previous notebook.
[21:15:48] [INFO] ✅ Validation passed
Next, we will use more advanced Jinja expressions to create new columns.
Jinja expressions let you:
Access nested attributes: {{ customer.first_name }}
Combine values: {{ customer.first_name }} {{ customer.last_name }}
Use conditional logic: {% if condition %}...{% endif %}
[21:15:48] [INFO] ✅ Validation passed
skip.whenSo far, every column is generated for every row. But sometimes an expensive LLM column only makes sense for a subset of rows — for example, a detailed complaint analysis is only useful when the review is negative.
Data Designer lets you skip column generation on a per-row basis using SkipConfig.
Skipped rows receive None by default, but you can provide a sentinel value with
skip=dd.SkipConfig(when="...", value="N/A") to write a specific value instead.
There are three patterns to know:
| Pattern | How | Effect |
|---|---|---|
| Expression gate | skip=dd.SkipConfig(when="...") |
Skip this column when the Jinja2 expression is truthy |
| Skip propagation (default) | Downstream column depends on a skipped column | Automatically skipped too (propagate_skip=True by default) |
| Propagation opt-out | propagate_skip=False on the downstream column |
Always generates, even if an upstream was skipped |
Pattern 1 — Expression gate. Only generate a detailed complaint analysis when the customer gave a low rating (1 or 2 stars).
Rows where the rating is 3 or higher will get None for this column.
DataDesignerConfigBuilder( sampler_columns: [ "customer", "product_category", "product_subcategory", "target_age_range", "review_style" ] llm_text_columns: ['complaint_analysis'] llm_structured_columns: ['product', 'customer_review'] expression_columns: ['customer_name', 'customer_age'] )
Pattern 2 — Skip propagation. action_items depends on complaint_analysis.
When complaint_analysis is skipped, action_items auto-skips too because
propagate_skip defaults to True.
DataDesignerConfigBuilder( sampler_columns: [ "customer", "product_category", "product_subcategory", "target_age_range", "review_style" ] llm_text_columns: ['complaint_analysis', 'action_items'] llm_structured_columns: ['product', 'customer_review'] expression_columns: ['customer_name', 'customer_age'] )
Pattern 3 — Propagation opt-out. review_summary also depends on complaint_analysis,
but sets propagate_skip=False so it always generates. The prompt uses a Jinja conditional
to handle the case where complaint_analysis is None.
[21:15:48] [INFO] ✅ Validation passed
Use the preview method to generate a sample of records quickly.
Inspect the results for quality and format issues.
Adjust column configurations, prompts, or parameters as needed.
Re-run the preview until satisfied.
[21:15:48] [INFO] 👁️ Preview generation in progress
[21:15:48] [INFO] |-- 🔒 Jinja rendering engine: secure
[21:15:48] [INFO] ✅ Validation passed
[21:15:48] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph
[21:15:48] [INFO] 🩺 Running health checks for models...
[21:15:48] [INFO] |-- 👀 Checking 'nvidia/nemotron-3-nano-30b-a3b' in provider named 'nvidia' for model alias 'nemotron-nano-v3'...
[21:15:49] [INFO] |-- ✅ Passed!
[21:15:49] [INFO] ⚡ DATA_DESIGNER_ASYNC_ENGINE is enabled - using async task-queue preview
[21:15:49] [INFO] 🗂️ llm-structured model config for column 'product'
[21:15:49] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:49] [INFO] |-- model alias: 'nemotron-nano-v3'
[21:15:49] [INFO] |-- model provider: 'nvidia'
[21:15:49] [INFO] |-- inference parameters:
[21:15:49] [INFO] | |-- generation_type=chat-completion
[21:15:49] [INFO] | |-- max_parallel_requests=4
[21:15:49] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[21:15:49] [INFO] | |-- temperature=1.00
[21:15:49] [INFO] | |-- top_p=1.00
[21:15:49] [INFO] | |-- max_tokens=2048
[21:15:49] [INFO] 🗂️ llm-structured model config for column 'customer_review'
[21:15:49] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:49] [INFO] |-- model alias: 'nemotron-nano-v3'
[21:15:49] [INFO] |-- model provider: 'nvidia'
[21:15:49] [INFO] |-- inference parameters:
[21:15:49] [INFO] | |-- generation_type=chat-completion
[21:15:49] [INFO] | |-- max_parallel_requests=4
[21:15:49] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[21:15:49] [INFO] | |-- temperature=1.00
[21:15:49] [INFO] | |-- top_p=1.00
[21:15:49] [INFO] | |-- max_tokens=2048
[21:15:49] [INFO] 📝 llm-text model config for column 'complaint_analysis'
[21:15:49] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:49] [INFO] |-- model alias: 'nemotron-nano-v3'
[21:15:49] [INFO] |-- model provider: 'nvidia'
[21:15:49] [INFO] |-- inference parameters:
[21:15:49] [INFO] | |-- generation_type=chat-completion
[21:15:49] [INFO] | |-- max_parallel_requests=4
[21:15:49] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[21:15:49] [INFO] | |-- temperature=1.00
[21:15:49] [INFO] | |-- top_p=1.00
[21:15:49] [INFO] | |-- max_tokens=2048
[21:15:49] [INFO] 📝 llm-text model config for column 'action_items'
[21:15:49] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:49] [INFO] |-- model alias: 'nemotron-nano-v3'
[21:15:49] [INFO] |-- model provider: 'nvidia'
[21:15:49] [INFO] |-- inference parameters:
[21:15:49] [INFO] | |-- generation_type=chat-completion
[21:15:49] [INFO] | |-- max_parallel_requests=4
[21:15:49] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[21:15:49] [INFO] | |-- temperature=1.00
[21:15:49] [INFO] | |-- top_p=1.00
[21:15:49] [INFO] | |-- max_tokens=2048
[21:15:49] [INFO] 📝 llm-text model config for column 'review_summary'
[21:15:49] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:49] [INFO] |-- model alias: 'nemotron-nano-v3'
[21:15:49] [INFO] |-- model provider: 'nvidia'
[21:15:49] [INFO] |-- inference parameters:
[21:15:49] [INFO] | |-- generation_type=chat-completion
[21:15:49] [INFO] | |-- max_parallel_requests=4
[21:15:49] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[21:15:49] [INFO] | |-- temperature=1.00
[21:15:49] [INFO] | |-- top_p=1.00
[21:15:49] [INFO] | |-- max_tokens=2048
[21:15:49] [INFO] ⚡️ Async generation: 5 column(s) (product, customer_review, complaint_analysis, action_items, review_summary), 10 tasks across 1 row group(s)
[21:15:49] [INFO] 🚀 (1/1) Dispatching with 2 records
[21:15:49] [INFO] 🎲 (1/1) Preparing samplers to generate 2 records across 5 columns
[21:15:49] [INFO] 🧩 (1/1) Generating column `customer_name` from expression
[21:15:49] [INFO] 🧩 (1/1) Generating column `customer_age` from expression
[21:15:51] [INFO] 📊 Progress [2.4s]:
[21:15:51] [INFO] |-- ☀️ product: 2/2 (100%) 0.8 rec/s
[21:15:51] [INFO] |-- 🚀 customer_review: 2/2 (100%) 0.8 rec/s
[21:15:51] [INFO] |-- ☀️ complaint_analysis: 2/2 (100%) 0.8 rec/s, 2 skipped
[21:15:51] [INFO] |-- ☀️ action_items: 2/2 (100%) 0.8 rec/s, 2 skipped
[21:15:51] [INFO] |-- 🦁 review_summary: 2/2 (100%) 0.8 rec/s
[21:15:51] [INFO] ✅ Async generation complete [2.4s]: 6 ok, 0 failed, 4 skipped across 5 column(s)
[21:15:51] [INFO] 📊 Model usage summary:
[21:15:51] [INFO] |-- model: nvidia/nemotron-3-nano-30b-a3b
[21:15:51] [INFO] |-- tokens: input=1618, output=612, total=2230, tps=902
[21:15:51] [INFO] |-- requests: success=6, failed=0, total=6, rpm=145
[21:15:51] [INFO] 🙈 Dropping columns: ['customer']
[21:15:51] [INFO] 📐 Measuring dataset column statistics:
[21:15:51] [INFO] |-- 🎲 column: 'product_category'
[21:15:51] [INFO] |-- 🎲 column: 'product_subcategory'
[21:15:51] [INFO] |-- 🎲 column: 'target_age_range'
[21:15:51] [INFO] |-- 🎲 column: 'review_style'
[21:15:51] [INFO] |-- 🧩 column: 'customer_name'
[21:15:51] [INFO] |-- 🧩 column: 'customer_age'
[21:15:51] [INFO] |-- 🗂️ column: 'product'
[21:15:51] [INFO] |-- 🗂️ column: 'customer_review'
[21:15:51] [INFO] |-- 📝 column: 'complaint_analysis'
[21:15:51] [INFO] |-- 📝 column: 'action_items'
[21:15:51] [INFO] |-- 📝 column: 'review_summary'
[21:15:51] [INFO] 🎊 Preview complete!
Generated Columns ┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Name ┃ Value ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ product_category │ Home Office │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ product_subcategory │ Office Supplies │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ target_age_range │ 25-35 │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ review_style │ structured with bullet points │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ complaint_analysis │ None │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ action_items │ None │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ review_summary │ The Dual‑Temp Smart Desk Cooler earns a 5‑star rating for its compact, whisper‑quiet │ │ │ cooling, 8‑hour rechargeable battery, built‑in USB charging, sleek brushed‑aluminum │ │ │ design, energy‑efficient performance, easy portability, and strong value at $89.99. │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ product │ { │ │ │ 'name': 'Dual-Temp Smart Desk Cooler', │ │ │ 'description': 'A compact, whisper-quiet mini air conditioner designed for home │ │ │ office use. Features a rechargeable battery lasting up to 8 hours, a built-in USB │ │ │ charging port for devices, and a sleek brushed aluminum finish. Perfect for keeping │ │ │ your workspace cool and focused during hot workdays.', │ │ │ 'price': 89.99 │ │ │ } │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ customer_review │ { │ │ │ 'rating': 5, │ │ │ 'customer_mood': 'happy', │ │ │ 'review': '- Dual-Temp Smart Desk Cooler delivers effective cooling in a compact │ │ │ form factor.\n- Whisper-quiet operation ensures a distraction‑free workspace.\n- │ │ │ Rechargeable battery provides up to eight hours of cooling without needing outlet │ │ │ access.\n- Integrated USB port allows simultaneous charging of smartphones, tablets, │ │ │ or other devices.\n- Brushed aluminum finish adds a professional aesthetic that │ │ │ complements home‑office décor.\n- Energy‑efficient design minimizes power │ │ │ consumption while maintaining consistent temperature.\n- Easy‑to‑assemble and │ │ │ portable, making it convenient to move between workstations.\n- Affordable pricing │ │ │ at $89.99 offers strong value for the features provided.' │ │ │ } │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ customer_name │ Angela Rice │ ├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤ │ customer_age │ 55 │ └─────────────────────┴──────────────────────────────────────────────────────────────────────────────────────┘
| product_category | product_subcategory | target_age_range | review_style | customer_age | customer_name | product | customer_review | complaint_analysis | action_items | review_summary | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Home Office | Office Supplies | 25-35 | structured with bullet points | 55 | Angela Rice | {'name': 'Dual-Temp Smart Desk Cooler', 'descr... | {'rating': 5, 'customer_mood': 'happy', 'revie... | None | None | The Dual‑Temp Smart Desk Cooler earns a 5‑star... |
| 1 | Clothing | Winter Coats | 50-65 | brief | 55 | Monica Herrera | {'name': 'Cozy Hearth Wool Blend Winter Coat',... | {'rating': 5, 'customer_mood': 'neutral', 'rev... | None | None | The Cozy Hearth Wool Blend Winter Coat impress... |
Data Designer automatically generates a basic statistical analysis of the generated data.
This analysis is available via the analysis property of generation result objects.
──────────────────────────────────────── 🎨 Data Designer Dataset Profile ───────────────────────────────────────── Dataset Overview ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ number of records ┃ number of columns ┃ percent complete records ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ 2 │ 11 │ 100.0% │ └─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘ 🎲 Sampler Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ column name ┃ data type ┃ number unique values ┃ sampler type ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ product_category │ string │ 2 (100.0%) │ category │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ product_subcategory │ string │ 2 (100.0%) │ subcategory │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ target_age_range │ string │ 2 (100.0%) │ category │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ review_style │ string │ 2 (100.0%) │ category │ └──────────────────────────────────┴──────────────────┴────────────────────────────────────┴──────────────────────┘ 📝 LLM-Text Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ ┃ prompt tokens ┃ completion tokens ┃ ┃ column name ┃ data type ┃ number unique values ┃ per record ┃ per record ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ complaint_analysis │ None │ 0 (0.0%) │ 164.0 +/- 19.0 │ 1.0 +/- 0.0 │ ├─────────────────────────┼──────────────┼────────────────────────────┼────────────────────┼──────────────────────┤ │ action_items │ None │ 0 (0.0%) │ 22.0 +/- 0.0 │ 1.0 +/- 0.0 │ ├─────────────────────────┼──────────────┼────────────────────────────┼────────────────────┼──────────────────────┤ │ review_summary │ string │ 2 (100.0%) │ 135.5 +/- 19.5 │ 57.5 +/- 6.4 │ └─────────────────────────┴──────────────┴────────────────────────────┴────────────────────┴──────────────────────┘ 🗂️ LLM-Structured Columns ┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ ┃ prompt tokens ┃ completion tokens ┃ ┃ column name ┃ data type ┃ number unique values ┃ per record ┃ per record ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩ │ product │ dict │ 2 (100.0%) │ 265.5 +/- 0.5 │ 71.5 +/- 10.6 │ ├───────────────────────┼───────────────┼────────────────────────────┼───────────────────┼────────────────────────┤ │ customer_review │ dict │ 2 (100.0%) │ 326.0 +/- 9.0 │ 128.0 +/- 32.5 │ └───────────────────────┴───────────────┴────────────────────────────┴───────────────────┴────────────────────────┘ 🧩 Expression Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ column name ┃ data type ┃ number unique values ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ customer_name │ string │ 2 (100.0%) │ ├───────────────────────────────────┼──────────────────────────┼──────────────────────────────────────────────────┤ │ customer_age │ string │ 1 (50.0%) │ └───────────────────────────────────┴──────────────────────────┴──────────────────────────────────────────────────┘ ╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮ │ │ │ 1. All token statistics are based on a sample of max(1000, len(dataset)) records. │ │ 2. Tokens are calculated using tiktoken's cl100k_base tokenizer. │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Happy with your preview data?
Use the create method to submit larger Data Designer generation jobs.
[21:15:51] [INFO] 🎨 Creating Data Designer dataset
[21:15:51] [INFO] |-- 🔒 Jinja rendering engine: secure
[21:15:51] [INFO] ✅ Validation passed
[21:15:51] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph
[21:15:51] [INFO] 🩺 Running health checks for models...
[21:15:51] [INFO] |-- 👀 Checking 'nvidia/nemotron-3-nano-30b-a3b' in provider named 'nvidia' for model alias 'nemotron-nano-v3'...
[21:15:52] [INFO] |-- ✅ Passed!
[21:15:52] [INFO] ⚡ DATA_DESIGNER_ASYNC_ENGINE is enabled - using async task-queue builder
[21:15:52] [INFO] 🗂️ llm-structured model config for column 'product'
[21:15:52] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:52] [INFO] |-- model alias: 'nemotron-nano-v3'
[21:15:52] [INFO] |-- model provider: 'nvidia'
[21:15:52] [INFO] |-- inference parameters:
[21:15:52] [INFO] | |-- generation_type=chat-completion
[21:15:52] [INFO] | |-- max_parallel_requests=4
[21:15:52] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[21:15:52] [INFO] | |-- temperature=1.00
[21:15:52] [INFO] | |-- top_p=1.00
[21:15:52] [INFO] | |-- max_tokens=2048
[21:15:52] [INFO] 🗂️ llm-structured model config for column 'customer_review'
[21:15:52] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:52] [INFO] |-- model alias: 'nemotron-nano-v3'
[21:15:52] [INFO] |-- model provider: 'nvidia'
[21:15:52] [INFO] |-- inference parameters:
[21:15:52] [INFO] | |-- generation_type=chat-completion
[21:15:52] [INFO] | |-- max_parallel_requests=4
[21:15:52] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[21:15:52] [INFO] | |-- temperature=1.00
[21:15:52] [INFO] | |-- top_p=1.00
[21:15:52] [INFO] | |-- max_tokens=2048
[21:15:52] [INFO] 📝 llm-text model config for column 'complaint_analysis'
[21:15:52] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:52] [INFO] |-- model alias: 'nemotron-nano-v3'
[21:15:52] [INFO] |-- model provider: 'nvidia'
[21:15:52] [INFO] |-- inference parameters:
[21:15:52] [INFO] | |-- generation_type=chat-completion
[21:15:52] [INFO] | |-- max_parallel_requests=4
[21:15:52] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[21:15:52] [INFO] | |-- temperature=1.00
[21:15:52] [INFO] | |-- top_p=1.00
[21:15:52] [INFO] | |-- max_tokens=2048
[21:15:52] [INFO] 📝 llm-text model config for column 'action_items'
[21:15:52] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:52] [INFO] |-- model alias: 'nemotron-nano-v3'
[21:15:52] [INFO] |-- model provider: 'nvidia'
[21:15:52] [INFO] |-- inference parameters:
[21:15:52] [INFO] | |-- generation_type=chat-completion
[21:15:52] [INFO] | |-- max_parallel_requests=4
[21:15:52] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[21:15:52] [INFO] | |-- temperature=1.00
[21:15:52] [INFO] | |-- top_p=1.00
[21:15:52] [INFO] | |-- max_tokens=2048
[21:15:52] [INFO] 📝 llm-text model config for column 'review_summary'
[21:15:52] [INFO] |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:52] [INFO] |-- model alias: 'nemotron-nano-v3'
[21:15:52] [INFO] |-- model provider: 'nvidia'
[21:15:52] [INFO] |-- inference parameters:
[21:15:52] [INFO] | |-- generation_type=chat-completion
[21:15:52] [INFO] | |-- max_parallel_requests=4
[21:15:52] [INFO] | |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[21:15:52] [INFO] | |-- temperature=1.00
[21:15:52] [INFO] | |-- top_p=1.00
[21:15:52] [INFO] | |-- max_tokens=2048
[21:15:52] [INFO] ⚡️ Async generation: 5 column(s) (product, customer_review, complaint_analysis, action_items, review_summary), 50 tasks across 1 row group(s)
[21:15:52] [INFO] 🚀 (1/1) Dispatching with 10 records
[21:15:52] [INFO] 🎲 (1/1) Preparing samplers to generate 10 records across 5 columns
[21:15:52] [INFO] 🧩 (1/1) Generating column `customer_name` from expression
[21:15:52] [INFO] 🧩 (1/1) Generating column `customer_age` from expression
[21:15:57] [INFO] 📊 Progress [5.1s]:
[21:15:57] [INFO] |-- 🌖 product: 8/10 (80%) 1.6 rec/s
[21:15:57] [INFO] |-- ⛅ customer_review: 7/10 (70%) 1.4 rec/s
[21:15:57] [INFO] |-- 😐 complaint_analysis: 7/10 (70%) 1.4 rec/s, 7 skipped
[21:15:57] [INFO] |-- 😸 action_items: 7/10 (70%) 1.4 rec/s, 7 skipped
[21:15:57] [INFO] |-- 🐥 review_summary: 6/10 (60%) 1.2 rec/s
[21:15:59] [INFO] 🙈 Dropping columns: ['customer']
[21:15:59] [INFO] 📊 Progress [7.2s]:
[21:15:59] [INFO] |-- 🌕 product: 10/10 (100%) 1.4 rec/s
[21:15:59] [INFO] |-- ☀️ customer_review: 10/10 (100%) 1.4 rec/s
[21:15:59] [INFO] |-- 🤩 complaint_analysis: 10/10 (100%) 1.4 rec/s, 10 skipped
[21:15:59] [INFO] |-- 🦁 action_items: 10/10 (100%) 1.4 rec/s, 10 skipped
[21:15:59] [INFO] |-- 🐔 review_summary: 10/10 (100%) 1.4 rec/s
[21:15:59] [INFO] ✅ Async generation complete [7.2s]: 30 ok, 0 failed, 20 skipped across 5 column(s)
[21:15:59] [INFO] 📊 Model usage summary:
[21:15:59] [INFO] |-- model: nvidia/nemotron-3-nano-30b-a3b
[21:15:59] [INFO] |-- tokens: input=9404, output=4346, total=13750, tps=1857
[21:15:59] [INFO] |-- requests: success=30, failed=0, total=30, rpm=243
[21:15:59] [INFO] 📐 Measuring dataset column statistics:
[21:15:59] [INFO] |-- 🎲 column: 'product_category'
[21:15:59] [INFO] |-- 🎲 column: 'product_subcategory'
[21:15:59] [INFO] |-- 🎲 column: 'target_age_range'
[21:15:59] [INFO] |-- 🎲 column: 'review_style'
[21:15:59] [INFO] |-- 🧩 column: 'customer_name'
[21:15:59] [INFO] |-- 🧩 column: 'customer_age'
[21:15:59] [INFO] |-- 🗂️ column: 'product'
[21:15:59] [INFO] |-- 🗂️ column: 'customer_review'
[21:15:59] [INFO] |-- 📝 column: 'complaint_analysis'
[21:15:59] [INFO] |-- 📝 column: 'action_items'
[21:15:59] [INFO] |-- 📝 column: 'review_summary'
| product_category | product_subcategory | target_age_range | review_style | customer_name | customer_age | product | customer_review | complaint_analysis | action_items | review_summary | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Home Office | Desks | 65+ | detailed | Stephanie Nelson | 35 | {'description': 'A compact, height‑adjustable ... | {'customer_mood': 'happy', 'rating': 4, 'revie... | None | None | The Ergonomic Adjustable Bedside Desk earns 4 ... |
| 1 | Books | Fiction | 25-35 | rambling | Dana Vasquez | 86 | {'description': 'A beautifully illustrated, th... | {'customer_mood': 'happy', 'rating': 5, 'revie... | None | None | A thoughtful, meditative novel that invites re... |
| 2 | Home Office | Chairs | 18-25 | rambling | Brittany Wilson | 18 | {'description': 'A lightweight, breathable swi... | {'customer_mood': 'happy', 'rating': 5, 'revie... | None | None | A 5‑star review describing a breathable mesh s... |
| 3 | Clothing | Men's Clothing | 25-35 | structured with bullet points | Jacqueline Boyd | 37 | {'description': 'A lightweight, moisture-wicki... | {'customer_mood': 'happy', 'rating': 5, 'revie... | None | None | A top-rated, sustainable t‑shirt that blends l... |
| 4 | Books | Classics | 65+ | detailed | Kristine Wells | 95 | {'description': 'A curated anthology of classi... | {'customer_mood': 'happy', 'rating': 5, 'revie... | None | None | A large‑type, cloth‑bound anthology of timeles... |
──────────────────────────────────────── 🎨 Data Designer Dataset Profile ───────────────────────────────────────── Dataset Overview ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ number of records ┃ number of columns ┃ percent complete records ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ 10 │ 11 │ 100.0% │ └─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘ 🎲 Sampler Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ column name ┃ data type ┃ number unique values ┃ sampler type ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ product_category │ string │ 5 (50.0%) │ category │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ product_subcategory │ string │ 10 (100.0%) │ subcategory │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ target_age_range │ string │ 4 (40.0%) │ category │ ├──────────────────────────────────┼──────────────────┼────────────────────────────────────┼──────────────────────┤ │ review_style │ string │ 4 (40.0%) │ category │ └──────────────────────────────────┴──────────────────┴────────────────────────────────────┴──────────────────────┘ 📝 LLM-Text Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ ┃ prompt tokens ┃ completion tokens ┃ ┃ column name ┃ data type ┃ number unique values ┃ per record ┃ per record ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ complaint_analysis │ None │ 0 (0.0%) │ 270.5 +/- 118.6 │ 1.0 +/- 0.0 │ ├─────────────────────────┼──────────────┼───────────────────────────┼─────────────────────┼──────────────────────┤ │ action_items │ None │ 0 (0.0%) │ 22.0 +/- 0.0 │ 1.0 +/- 0.0 │ ├─────────────────────────┼──────────────┼───────────────────────────┼─────────────────────┼──────────────────────┤ │ review_summary │ string │ 10 (100.0%) │ 242.0 +/- 119.1 │ 55.5 +/- 16.8 │ └─────────────────────────┴──────────────┴───────────────────────────┴─────────────────────┴──────────────────────┘ 🗂️ LLM-Structured Columns ┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ ┃ ┃ prompt tokens ┃ completion tokens ┃ ┃ column name ┃ data type ┃ number unique values ┃ per record ┃ per record ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩ │ product │ dict │ 10 (100.0%) │ 265.0 +/- 1.0 │ 84.5 +/- 8.2 │ ├───────────────────────┼───────────────┼────────────────────────────┼───────────────────┼────────────────────────┤ │ customer_review │ dict │ 10 (100.0%) │ 341.0 +/- 8.4 │ 229.0 +/- 124.7 │ └───────────────────────┴───────────────┴────────────────────────────┴───────────────────┴────────────────────────┘ 🧩 Expression Columns ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ column name ┃ data type ┃ number unique values ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ customer_name │ string │ 10 (100.0%) │ ├───────────────────────────────────┼──────────────────────────┼──────────────────────────────────────────────────┤ │ customer_age │ string │ 10 (100.0%) │ └───────────────────────────────────┴──────────────────────────┴──────────────────────────────────────────────────┘ ╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮ │ │ │ 1. All token statistics are based on a sample of max(1000, len(dataset)) records. │ │ 2. Tokens are calculated using tiktoken's cl100k_base tokenizer. │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Check out the following notebook to learn more about: