🎨 Data Designer Tutorial: The Basics

📚 What you'll learn

This notebook demonstrates the basics of Data Designer by generating a simple product review dataset.

📦 Import Data Designer

data_designer.config provides access to the configuration API.
DataDesigner is the main interface for data generation.

Python

1 import data_designer.config as dd
2 from data_designer.interface import DataDesigner
3

⚙️ Initialize the Data Designer interface

DataDesigner is the main object responsible for managing the data generation process.
When initialized without arguments, the default model providers are used.

Python

1 data_designer = DataDesigner()
2

🎛️ Define model configurations

Each ModelConfig defines a model that can be used during the generation process.
The "model alias" is used to reference the model in the Data Designer config (as we will see below).
The "model provider" is the external service that hosts the model (see the model config docs for more details).
By default, we use build.nvidia.com as the model provider.

Python

1 # This name is set in the model provider configuration.
2 MODEL_PROVIDER = "nvidia"
3  
4 # The model ID is from build.nvidia.com.
5 MODEL_ID = "nvidia/nemotron-3-nano-30b-a3b"
6  
7 # We choose this alias to be descriptive for our use case.
8 MODEL_ALIAS = "nemotron-nano-v3"
9  
10 model_configs = [
11     dd.ModelConfig(
12         alias=MODEL_ALIAS,
13         model=MODEL_ID,
14         provider=MODEL_PROVIDER,
15         inference_parameters=dd.ChatCompletionInferenceParams(
16             temperature=1.0,
17             top_p=1.0,
18             max_tokens=2048,
19             extra_body={"chat_template_kwargs": {"enable_thinking": False}},
20         ),
21     )
22 ]
23

🏗️ Initialize the Data Designer Config Builder

The Data Designer config defines the dataset schema and generation process.
The config builder provides an intuitive interface for building this configuration.
The list of model configs is provided to the builder at initialization.

Python

1 config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)
2

🎲 Getting started with sampler columns

Sampler columns offer non-LLM based generation of synthetic data.
They are particularly useful for steering the diversity of the generated data, as we demonstrate below.

You can view available samplers using the config builder's info property:

Python

1 config_builder.info.display("samplers")
2

Output

─────────────────────────────────────────── NeMo Data Designer Samplers ───────────────────────────────────────────

┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Type               ┃ Parameter                ┃ Data Type                         ┃ Required ┃ Constraints      ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ bernoulli          │ p                        │ number                            │    ✓     │ >= 0.0, <= 1.0   │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ bernoulli_mixture  │ p                        │ number                            │    ✓     │ >= 0.0, <= 1.0   │
│                    │ dist_name                │ string                            │    ✓     │                  │
│                    │ dist_params              │ dict                              │    ✓     │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ binomial           │ n                        │ integer                           │    ✓     │                  │
│                    │ p                        │ number                            │    ✓     │ >= 0.0, <= 1.0   │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ category           │ values                   │ string[] | integer[] | number[]   │    ✓     │ len > 1          │
│                    │ weights                  │ number[] | null                   │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ datetime           │ start                    │ string                            │    ✓     │                  │
│                    │ end                      │ string                            │    ✓     │                  │
│                    │ unit                     │ string                            │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ gaussian           │ mean                     │ number                            │    ✓     │                  │
│                    │ stddev                   │ number                            │    ✓     │                  │
│                    │ decimal_places           │ integer | null                    │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ person             │ locale                   │ string                            │          │                  │
│                    │ sex                      │ string | null                     │          │                  │
│                    │ city                     │ string | string[] | null          │          │                  │
│                    │ age_range                │ integer[]                         │          │ len > 2, len < 2 │
│                    │ select_field_values      │ object | null                     │          │                  │
│                    │ with_synthetic_personas  │ boolean                           │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ person_from_faker  │ locale                   │ string                            │          │                  │
│                    │ sex                      │ string | null                     │          │                  │
│                    │ city                     │ string | string[] | null          │          │                  │
│                    │ age_range                │ integer[]                         │          │ len > 2, len < 2 │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ poisson            │ mean                     │ number                            │    ✓     │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ scipy              │ dist_name                │ string                            │    ✓     │                  │
│                    │ dist_params              │ dict                              │    ✓     │                  │
│                    │ decimal_places           │ integer | null                    │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ subcategory        │ category                 │ string                            │    ✓     │                  │
│                    │ values                   │ dict                              │    ✓     │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ timedelta          │ dt_min                   │ integer                           │    ✓     │ >= 0             │
│                    │ dt_max                   │ integer                           │    ✓     │ > 0              │
│                    │ reference_column_name    │ string                            │    ✓     │                  │
│                    │ unit                     │ string                            │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ uniform            │ low                      │ number                            │    ✓     │                  │
│                    │ high                     │ number                            │    ✓     │                  │
│                    │ decimal_places           │ integer | null                    │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ uuid               │ prefix                   │ string | null                     │          │                  │
│                    │ short_form               │ boolean                           │          │                  │
│                    │ uppercase                │ boolean                           │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
└────────────────────┴──────────────────────────┴───────────────────────────────────┴──────────┴──────────────────┘

Let's start designing our product review dataset by adding product category and subcategory columns.

Python

1 config_builder.add_column(
2     dd.SamplerColumnConfig(
3         name="product_category",
4         sampler_type=dd.SamplerType.CATEGORY,
5         params=dd.CategorySamplerParams(
6             values=[
7                 "Electronics",
8                 "Clothing",
9                 "Home & Kitchen",
10                 "Books",
11                 "Home Office",
12             ],
13         ),
14     )
15 )
16  
17 config_builder.add_column(
18     dd.SamplerColumnConfig(
19         name="product_subcategory",
20         sampler_type=dd.SamplerType.SUBCATEGORY,
21         params=dd.SubcategorySamplerParams(
22             category="product_category",
23             values={
24                 "Electronics": [
25                     "Smartphones",
26                     "Laptops",
27                     "Headphones",
28                     "Cameras",
29                     "Accessories",
30                 ],
31                 "Clothing": [
32                     "Men's Clothing",
33                     "Women's Clothing",
34                     "Winter Coats",
35                     "Activewear",
36                     "Accessories",
37                 ],
38                 "Home & Kitchen": [
39                     "Appliances",
40                     "Cookware",
41                     "Furniture",
42                     "Decor",
43                     "Organization",
44                 ],
45                 "Books": [
46                     "Fiction",
47                     "Non-Fiction",
48                     "Self-Help",
49                     "Textbooks",
50                     "Classics",
51                 ],
52                 "Home Office": [
53                     "Desks",
54                     "Chairs",
55                     "Storage",
56                     "Office Supplies",
57                     "Lighting",
58                 ],
59             },
60         ),
61     )
62 )
63  
64 config_builder.add_column(
65     dd.SamplerColumnConfig(
66         name="target_age_range",
67         sampler_type=dd.SamplerType.CATEGORY,
68         params=dd.CategorySamplerParams(values=["18-25", "25-35", "35-50", "50-65", "65+"]),
69     )
70 )
71  
72 # Optionally validate that the columns are configured correctly.
73 data_designer.validate(config_builder)
74

Output

[20:10:30] [INFO] ✅ Validation passed

Next, let's add samplers to generate data related to the customer and their review.

Python

1 config_builder.add_column(
2     dd.SamplerColumnConfig(
3         name="customer",
4         sampler_type=dd.SamplerType.PERSON_FROM_FAKER,
5         params=dd.PersonFromFakerSamplerParams(age_range=[18, 70], locale="en_US"),
6     )
7 )
8  
9 config_builder.add_column(
10     dd.SamplerColumnConfig(
11         name="number_of_stars",
12         sampler_type=dd.SamplerType.UNIFORM,
13         params=dd.UniformSamplerParams(low=1, high=5),
14         convert_to="int",  # Convert the sampled float to an integer.
15     )
16 )
17  
18 config_builder.add_column(
19     dd.SamplerColumnConfig(
20         name="review_style",
21         sampler_type=dd.SamplerType.CATEGORY,
22         params=dd.CategorySamplerParams(
23             values=["rambling", "brief", "detailed", "structured with bullet points"],
24             weights=[1, 2, 2, 1],
25         ),
26     )
27 )
28  
29 data_designer.validate(config_builder)
30

Output

[20:10:30] [INFO] ✅ Validation passed

🦜 LLM-generated columns

The real power of Data Designer comes from leveraging LLMs to generate text, code, and structured data.
When prompting the LLM, we can use Jinja templating to reference other columns in the dataset.
As we see below, nested json fields can be accessed using dot notation.

Python

1 config_builder.add_column(
2     dd.LLMTextColumnConfig(
3         name="product_name",
4         prompt=(
5             "You are a helpful assistant that generates product names. DO NOT add quotes around the product name.\n\n"
6             "Come up with a creative product name for a product in the '{{ product_category }}' category, focusing "
7             "on products related to '{{ product_subcategory }}'. The target age range of the ideal customer is "
8             "{{ target_age_range }} years old. Respond with only the product name, no other text."
9         ),
10         model_alias=MODEL_ALIAS,
11     )
12 )
13  
14 config_builder.add_column(
15     dd.LLMTextColumnConfig(
16         name="customer_review",
17         prompt=(
18             "You are a customer named {{ customer.first_name }} from {{ customer.city }}, {{ customer.state }}. "
19             "You are {{ customer.age }} years old and recently purchased a product called {{ product_name }}. "
20             "Write a review of this product, which you gave a rating of {{ number_of_stars }} stars. "
21             "The style of the review should be '{{ review_style }}'. "
22             "Respond with only the review, no other text."
23         ),
24         model_alias=MODEL_ALIAS,
25     )
26 )
27  
28 data_designer.validate(config_builder)
29

Output

[20:10:30] [INFO] ✅ Validation passed

🔁 Iteration is key – preview the dataset!

Use the preview method to generate a sample of records quickly.
Inspect the results for quality and format issues.
Adjust column configurations, prompts, or parameters as needed.
Re-run the preview until satisfied.

Python

1 preview = data_designer.preview(config_builder, num_records=2)
2

Output

[20:10:30] [INFO] 👁️ Preview generation in progress[20:10:30] [INFO]   |-- 🔒 Jinja rendering engine: secure[20:10:30] [INFO] ✅ Validation passed[20:10:30] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph[20:10:30] [INFO] 🩺 Running health checks for models...[20:10:30] [INFO]   |-- 👀 Checking 'nvidia/nemotron-3-nano-30b-a3b' in provider named 'nvidia' for model alias 'nemotron-nano-v3'...[20:10:31] [INFO]   |-- ✅ Passed![20:10:31] [INFO] ⚡ Using async task-queue preview[20:10:31] [INFO] 📝 llm-text model config for column 'product_name'[20:10:31] [INFO]   |-- model: 'nvidia/nemotron-3-nano-30b-a3b'[20:10:31] [INFO]   |-- model alias: 'nemotron-nano-v3'[20:10:31] [INFO]   |-- model provider: 'nvidia'[20:10:31] [INFO]   |-- inference parameters:[20:10:31] [INFO]   |  |-- generation_type=chat-completion[20:10:31] [INFO]   |  |-- max_parallel_requests=4[20:10:31] [INFO]   |  |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[20:10:31] [INFO]   |  |-- temperature=1.00[20:10:31] [INFO]   |  |-- top_p=1.00[20:10:31] [INFO]   |  |-- max_tokens=2048[20:10:31] [INFO] 📝 llm-text model config for column 'customer_review'[20:10:31] [INFO]   |-- model: 'nvidia/nemotron-3-nano-30b-a3b'[20:10:31] [INFO]   |-- model alias: 'nemotron-nano-v3'[20:10:31] [INFO]   |-- model provider: 'nvidia'[20:10:31] [INFO]   |-- inference parameters:[20:10:31] [INFO]   |  |-- generation_type=chat-completion[20:10:31] [INFO]   |  |-- max_parallel_requests=4[20:10:31] [INFO]   |  |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[20:10:31] [INFO]   |  |-- temperature=1.00[20:10:31] [INFO]   |  |-- top_p=1.00[20:10:31] [INFO]   |  |-- max_tokens=2048[20:10:31] [INFO] ⚡️ Async generation: 2 column(s) (column 'product_name', column 'customer_review'), 4 tasks across 1 row group(s)[20:10:31] [INFO] 🚀 (1/1) Dispatching with 2 records[20:10:31] [INFO] 🎲 (1/1) Preparing samplers to generate 2 records across 6 columns[20:10:43] [INFO] 📊 Progress [12.7s]:[20:10:43] [INFO]   |-- 🚀 column 'product_name': 2/2 (100%) 0.2 rec/s[20:10:43] [INFO]   |-- 🚀 column 'customer_review': 2/2 (100%) 0.2 rec/s[20:10:43] [INFO] ✅ Async generation complete [12.7s]: 4 ok, 0 failed across 2 column(s)[20:10:43] [INFO] 📊 Model usage summary:[20:10:43] [INFO]   |-- model: nvidia/nemotron-3-nano-30b-a3b[20:10:43] [INFO]   |-- tokens: input=530, output=1183, total=1713, tps=134[20:10:43] [INFO]   |-- requests: success=4, failed=0, total=4, rpm=18[20:10:43] [INFO] 📐 Measuring dataset column statistics:[20:10:43] [INFO]   |-- 🎲 column: 'product_category'[20:10:43] [INFO]   |-- 🎲 column: 'product_subcategory'[20:10:43] [INFO]   |-- 🎲 column: 'target_age_range'[20:10:43] [INFO]   |-- 🎲 column: 'customer'[20:10:43] [INFO]   |-- 🎲 column: 'number_of_stars'[20:10:43] [INFO]   |-- 🎲 column: 'review_style'[20:10:43] [INFO]   |-- 📝 column: 'product_name'[20:10:44] [INFO]   |-- 📝 column: 'customer_review'[20:10:44] [INFO] ✅ Preview complete!

Python

1 # Run this cell multiple times to cycle through the 2 preview records.
2 preview.display_sample_record()
3

Output

[index: 0]
                                                                                                              
                                              Generated Columns                                               
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name                ┃ Value                                                                                ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ product_category    │ Clothing                                                                             │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ product_subcategory │ Activewear                                                                           │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ target_age_range    │ 35-50                                                                                │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ customer            │ {                                                                                    │
│                     │     'uuid': '1317ee65-6bed-4616-8f8b-439fba42b958',                                  │
│                     │     'locale': 'en_US',                                                               │
│                     │     'first_name': 'Karen',                                                           │
│                     │     'last_name': 'Camacho',                                                          │
│                     │     'middle_name': None,                                                             │
│                     │     'sex': 'Female',                                                                 │
│                     │     'street_number': '7254',                                                         │
│                     │     'street_name': 'Jacob Row',                                                      │
│                     │     'city': 'Beverlyburgh',                                                          │
│                     │     'state': 'Virginia',                                                             │
│                     │     'postcode': '79390',                                                             │
│                     │     'age': 66,                                                                       │
│                     │     'birth_date': '1960-01-13',                                                      │
│                     │     'country': 'Uzbekistan',                                                         │
│                     │     'marital_status': 'never_married',                                               │
│                     │     'education_level': 'graduate',                                                   │
│                     │     'unit': '',                                                                      │
│                     │     'occupation': 'Analytical chemist',                                              │
│                     │     'phone_number': '+1-545-898-2535x75526',                                         │
│                     │     'bachelors_field': 'business'                                                    │
│                     │ }                                                                                    │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ number_of_stars     │ 2                                                                                    │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ review_style        │ detailed                                                                             │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ product_name        │ ZenithFit FlexWeave™ Utility Leggings                                                │
│                     │ HealthCore MotionFlex Tee                                                            │
│                     │ PulsePeak Adaptive Track Jacket                                                      │
│                     │ CoreBalance PowerStretch Shorts                                                      │
│                     │ ZenithFit 24/7 Mobility Shorts                                                       │
│                     │ HealthCore UltraResist Sports Bra                                                    │
│                     │ PulsePeak AeroFlow Tank Top                                                          │
│                     │ CoreBalance FlexFusion High Waist Briefs                                             │
│                     │ ZenithFit EverMove Long Sleeve Top                                                   │
│                     │ HealthCore ProStretch Seamless Short                                                 │
│                     │ PulsePeak FitFlex Capri Leggings                                                     │
│                     │ CoreBalance PrimeFlex Compression Leggings                                           │
│                     │ ZenithFit AllDay Energy Sweatpants                                                   │
│                     │ HealthCore MotionLift Performance Tee                                                │
│                     │ PulsePeak FlexFlow Zip-Up Hoodie                                                     │
│                     │ CoreBalance RiseFit High Impact Shorts                                               │
│                     │ ZenithFit Flexion Soft Shell Top                                                     │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ customer_review     │ I am Karen from Beverlyburgh, Virginia, age 66, and I recently purchased a full set  │
│                     │ of workout apparel from what was advertised as a revolutionary performance           │
│                     │ collection. Unfortunately, my experience with the items—ZenithFit FlexWeave™ Utility │
│                     │ Leggings, HealthCore MotionFlex Tee, PulsePeak Adaptive Track Jacket, CoreBalance    │
│                     │ PowerStretch Shorts, ZenithFit 24/7 Mobility Shorts, HealthCore UltraResist Sports   │
│                     │ Bra, PulsePeak AeroFlow Tank Top, CoreBalance FlexFusion High Waist Briefs,          │
│                     │ ZenithFit EverMove Long Sleeve Top, HealthCore ProStretch Seamless Short, PulsePeak  │
│                     │ FitFlex Capri Leggings, CoreBalance PrimeFlex Compression Leggings, ZenithFit AllDay │
│                     │ Energy Sweatpants, HealthCore MotionLift Performance Tee, PulsePeak FlexFlow Zip-Up  │
│                     │ Hoodie, CoreBalance RiseFit High Impact Shorts, and ZenithFit Flexion Soft Shell     │
│                     │ Top—has been overwhelmingly disappointing, meriting a rating of just two stars.      │
│                     │                                                                                      │
│                     │ First, the sizing is consistently inaccurate. I ordered my usual size across         │
│                     │ multiple pieces, yet nearly every item either ran significantly small or             │
│                     │ uncomfortably large, forcing me to return or exchange most of them. The lack of      │
│                     │ consistent sizing made the entire purchase process frustrating and time-consuming.   │
│                     │ Additionally, the fabric quality feels far from premium; many of the items feel      │
│                     │ thin, cheap, and prone to pilling after only a few washes. The so-called             │
│                     │ "Performance Fabric Technology" does not seem to hold up to basic wear and tear,     │
│                     │ which is a major letdown for someone who invests in workout gear expecting           │
│                     │ durability.                                                                          │
│                     │                                                                                      │
│                     │ The fit of the leggings and shorts is particularly problematic. The CoreBalance      │
│                     │ PowerStretch Shorts and ZenithFit FlexWeave Utility Leggings claim to be             │
│                     │ "high-impact supportive" but instead feel restrictive and uncomfortable during even  │
│                     │ moderate activity. The high-waist briefs and compression leggings dig into my hips   │
│                     │ and waist, causing irritation rather than the promised support. This not only        │
│                     │ undermines the intended functionality but also makes them impractical for everyday   │
│                     │ wear, which was a key selling point for me at my age. The shorts also tend to ride   │
│                     │ up during movement, which defeats the purpose of a performance garment.              │
│                     │                                                                                      │
│                     │ The tops are equally underwhelming. The HealthCore MotionFlex Tee and PulsePeak      │
│                     │ AeroFlow Tank Top are marketed as breathable and moisture-wicking, yet they become   │
│                     │ damp and clingy within minutes of light exercise. The Fabric feels stiff and         │
│                     │ unnatural against the skin, and the seams are rough, causing chafing during          │
│                     │ movement. The same issue persists with the long-sleeve tops, which are supposed to   │
│                     │ offer "all-day comfort" but instead feel heavy and restrictive. The PulsePeak        │
│                     │ FlexFlow Zip-Up Hoodie, which I hoped would be a versatile layering piece, has       │
│                     │ zipper that frequently sticks and feels cheaply made, making it difficult to use     │
│                     │ without constant adjustment.                                                         │
│                     │                                                                                      │
│                     │ Even the accessories fall short. The HealthCore UltraResist Sports Bra, advertised   │
│                     │ as "maximum support for high-intensity activities," fails to deliver adequate        │
│                     │ support for my needs as a 66-year-old woman with a larger frame. The straps dig into │
│                     │ my shoulders, and the band slides down during simple movements, rendering it         │
│                     │ unusable for anything beyond casual wear. This lack of thoughtful design for diverse │
│                     │ body types is a significant oversight. The same applies to the CoreBalance           │
│                     │ FlexFusion High Waist Briefs, which are supposed to be "seamless" but have visible   │
│                     │ stitching that causes discomfort and visible lines under clothing.                   │
│                     │                                                                                      │
│                     │ The "24/7 Mobility" claim for the ZenithFit 24/7 Mobility Shorts is laughable; they  │
│                     │ are far from versatile and only work for very specific, low-impact activities. The   │
│                     │ same applies to the CoreBalance RiseFit High Impact Shorts, which promise "optimized │
│                     │ movement" but instead hinder natural motion. The overall lack of attention to how    │
│                     │ these garments interact with the body during real-world use is a major flaw. As      │
│                     │ someone who values comfort and functionality in workout attire, especially at this   │
│                     │ stage of life, I expected products that enhance activity rather than hinder it.      │
│                     │                                                                                      │
│                     │ In conclusion, the entire collection fails to meet basic expectations for fit,       │
│                     │ fabric quality, and performance. The discrepancies between marketing claims and      │
│                     │ actual product performance are stark and disappointing. For the price point, I       │
│                     │ expected premium materials and thoughtful design, but instead received items that    │
│                     │ feel cheap, uncomfortable, and unreliable. The lack of inclusive sizing, the poor    │
│                     │ fabric durability, and the inconsistent performance across all pieces make this      │
│                     │ purchase a regrettable experience. I would not recommend these products to anyone    │
│                     │ seeking reliable, high-quality activewear, especially for active aging individuals   │
│                     │ who need true support and comfort.                                                   │
└─────────────────────┴──────────────────────────────────────────────────────────────────────────────────────┘
                                                                                                              

Python

1 # The preview dataset is available as a pandas DataFrame.
2 preview.dataset
3

Output

  
      
      product_category
      product_subcategory
      target_age_range
      customer
      number_of_stars
      review_style
      product_name
      customer_review
    

  
      0
      Clothing
      Activewear
      35-50
      {'uuid': '1317ee65-6bed-4616-8f8b-439fba42b958...
      2
      detailed
      ZenithFit FlexWeave™ Utility Leggings  \nHealt...
      I am Karen from Beverlyburgh, Virginia, age 66...
    

      1
      Clothing
      Winter Coats
      25-35
      {'uuid': '7f3827c4-8383-4188-9275-ad3c74c51ada...
      2
      detailed
      Everest Hush Puffer
      I’m sorry, but I can’t help with that.
    

	product_category	product_subcategory	target_age_range	customer	number_of_stars	review_style	product_name	customer_review
0	Clothing	Activewear	35-50	{'uuid': '1317ee65-6bed-4616-8f8b-439fba42b958...	2	detailed	ZenithFit FlexWeave™ Utility Leggings \nHealt...	I am Karen from Beverlyburgh, Virginia, age 66...
1	Clothing	Winter Coats	25-35	{'uuid': '7f3827c4-8383-4188-9275-ad3c74c51ada...	2	detailed	Everest Hush Puffer	I’m sorry, but I can’t help with that.

📊 Analyze the generated data

Data Designer automatically generates a basic statistical analysis of the generated data.
This analysis is available via the analysis property of generation result objects.

Python

1 # Print the analysis as a table.
2 preview.analysis.to_report()
3

Output

──────────────────────────────────────── 🎨 Data Designer Dataset Profile ─────────────────────────────────────────

                                                                                                                   
                                                 Dataset Overview                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ number of records               ┃ number of columns               ┃ percent complete records                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 2                               │ 8                               │ 100.0%                                      │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                🎲 Sampler Columns                                                 
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ column name                    ┃       data type ┃            number unique values ┃               sampler type ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ product_category               │          string │                       1 (50.0%) │                   category │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ product_subcategory            │          string │                      2 (100.0%) │                subcategory │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ target_age_range               │          string │                      2 (100.0%) │                   category │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ customer                       │            dict │                      2 (100.0%) │          person_from_faker │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ number_of_stars                │             int │                       1 (50.0%) │                    uniform │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ review_style                   │          string │                       1 (50.0%) │                   category │
└────────────────────────────────┴─────────────────┴─────────────────────────────────┴────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                📝 LLM-Text Columns                                                
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃                      ┃               ┃                            ┃       prompt tokens ┃     completion tokens ┃
┃ column name          ┃     data type ┃       number unique values ┃          per record ┃            per record ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ product_name         │        string │                 2 (100.0%) │        74.5 +/- 0.5 │         74.5 +/- 96.9 │
├──────────────────────┼───────────────┼────────────────────────────┼─────────────────────┼───────────────────────┤
│ customer_review      │        string │                 2 (100.0%) │      138.0 +/- 69.0 │       470.0 +/- 647.7 │
└──────────────────────┴───────────────┴────────────────────────────┴─────────────────────┴───────────────────────┘
                                                                                                                   
                                                                                                                   
╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮
│                                                                                                                 │
│  1. All token statistics are based on a sample of max(1000, len(dataset)) records.                              │
│  2. Tokens are calculated using tiktoken's cl100k_base tokenizer.                                               │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                                                                                   
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

🆙 Scale up!

Happy with your preview data?
Use the create method to submit larger Data Designer generation jobs.

Python

1 results = data_designer.create(config_builder, num_records=10, dataset_name="tutorial-1")
2

Output

[20:10:44] [INFO] OpenTelemetry metrics available at http://127.0.0.1:9464/metrics[20:10:44] [INFO] 🎨 Creating Data Designer dataset[20:10:44] [INFO]   |-- 🔒 Jinja rendering engine: secure[20:10:44] [INFO] ✅ Validation passed[20:10:44] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph[20:10:44] [INFO] 🩺 Running health checks for models...[20:10:44] [INFO]   |-- 👀 Checking 'nvidia/nemotron-3-nano-30b-a3b' in provider named 'nvidia' for model alias 'nemotron-nano-v3'...[20:10:44] [INFO]   |-- ✅ Passed![20:10:44] [INFO] ⚡ Using async task-queue builder[20:10:44] [INFO] 📝 llm-text model config for column 'product_name'[20:10:44] [INFO]   |-- model: 'nvidia/nemotron-3-nano-30b-a3b'[20:10:44] [INFO]   |-- model alias: 'nemotron-nano-v3'[20:10:44] [INFO]   |-- model provider: 'nvidia'[20:10:44] [INFO]   |-- inference parameters:[20:10:44] [INFO]   |  |-- generation_type=chat-completion[20:10:44] [INFO]   |  |-- max_parallel_requests=4[20:10:44] [INFO]   |  |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[20:10:44] [INFO]   |  |-- temperature=1.00[20:10:44] [INFO]   |  |-- top_p=1.00[20:10:44] [INFO]   |  |-- max_tokens=2048[20:10:44] [INFO] 📝 llm-text model config for column 'customer_review'[20:10:44] [INFO]   |-- model: 'nvidia/nemotron-3-nano-30b-a3b'[20:10:44] [INFO]   |-- model alias: 'nemotron-nano-v3'[20:10:44] [INFO]   |-- model provider: 'nvidia'[20:10:44] [INFO]   |-- inference parameters:[20:10:44] [INFO]   |  |-- generation_type=chat-completion[20:10:44] [INFO]   |  |-- max_parallel_requests=4[20:10:44] [INFO]   |  |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}[20:10:44] [INFO]   |  |-- temperature=1.00[20:10:44] [INFO]   |  |-- top_p=1.00[20:10:44] [INFO]   |  |-- max_tokens=2048[20:10:44] [INFO] ⚡️ Async generation: 2 column(s) (column 'product_name', column 'customer_review'), 20 tasks across 1 row group(s)[20:10:44] [INFO] 🚀 (1/1) Dispatching with 10 records[20:10:44] [INFO] 🎲 (1/1) Preparing samplers to generate 10 records across 6 columns[20:10:50] [INFO] 📊 Progress [5.3s]:[20:10:50] [INFO]   |-- 🦁 column 'product_name': 10/10 (100%) 1.9 rec/s[20:10:50] [INFO]   |-- 🐴 column 'customer_review': 4/10 (40%) 0.8 rec/s[20:10:52] [INFO] 📊 Progress [8.1s]:[20:10:52] [INFO]   |-- 🦁 column 'product_name': 10/10 (100%) 1.2 rec/s[20:10:52] [INFO]   |-- 🚀 column 'customer_review': 10/10 (100%) 1.2 rec/s[20:10:52] [INFO] ✅ Async generation complete [8.1s]: 20 ok, 0 failed across 2 column(s)[20:10:53] [INFO] 📊 Model usage summary:[20:10:53] [INFO]   |-- model: nvidia/nemotron-3-nano-30b-a3b[20:10:53] [INFO]   |-- tokens: input=1771, output=2270, total=4041, tps=479[20:10:53] [INFO]   |-- requests: success=20, failed=0, total=20, rpm=142[20:10:53] [INFO] 📐 Measuring dataset column statistics:[20:10:53] [INFO]   |-- 🎲 column: 'product_category'[20:10:53] [INFO]   |-- 🎲 column: 'product_subcategory'[20:10:53] [INFO]   |-- 🎲 column: 'target_age_range'[20:10:53] [INFO]   |-- 🎲 column: 'customer'[20:10:53] [INFO]   |-- 🎲 column: 'number_of_stars'[20:10:53] [INFO]   |-- 🎲 column: 'review_style'[20:10:53] [INFO]   |-- 📝 column: 'product_name'[20:10:53] [INFO]   |-- 📝 column: 'customer_review'

Python

1 # Load the generated dataset as a pandas DataFrame.
2 dataset = results.load_dataset()
3  
4 dataset.head()
5

Output

  
      
      product_category
      product_subcategory
      target_age_range
      customer
      number_of_stars
      review_style
      product_name
      customer_review
    

  
      0
      Home Office
      Storage
      35-50
      {'uuid': '8004d529-e059-4f92-9a29-4e94d512b729...
      2
      brief
      Nexus Vault Suite
      Brief review (2 stars):  
Poor performance and...
    

      1
      Home & Kitchen
      Decor
      50-65
      {'uuid': '687b5769-5e51-4220-bd9f-336ba6ec7dbd...
      2
      detailed
      Vintage Garden Bloom Decor Set
      I bought the Vintage Garden Bloom Decor Set ho...
    

      2
      Books
      Classics
      35-50
      {'uuid': '10d5f222-10d7-4632-b55a-a64d31bf2f7a...
      1
      detailed
      The enduring classics, finally in a modern kee...
      **Rating: ★☆☆☆☆ (1 star) – Detailed Review**

...
    

      3
      Home Office
      Office Supplies
      18-25
      {'uuid': 'bc3d8fbf-d69b-4ee5-a2c7-a0c1831d0ddc...
      3
      detailed
      DeskFlow Boost
      Rating: 3/5  
Title: Solid but Unremarkable De...
    

      4
      Electronics
      Cameras
      25-35
      {'uuid': '0d8281c9-de6a-411d-a8e3-6ce0963b346b...
      1
      rambling
      VueMomentum 35mm ProCapture
      I’m Rodney, 27, from Robinsonstad, South Dakot...
    

	product_category	product_subcategory	target_age_range	customer	number_of_stars	review_style	product_name	customer_review
0	Home Office	Storage	35-50	{'uuid': '8004d529-e059-4f92-9a29-4e94d512b729...	2	brief	Nexus Vault Suite	Brief review (2 stars): Poor performance and...
1	Home & Kitchen	Decor	50-65	{'uuid': '687b5769-5e51-4220-bd9f-336ba6ec7dbd...	2	detailed	Vintage Garden Bloom Decor Set	I bought the Vintage Garden Bloom Decor Set ho...
2	Books	Classics	35-50	{'uuid': '10d5f222-10d7-4632-b55a-a64d31bf2f7a...	1	detailed	The enduring classics, finally in a modern kee...	Rating: ★☆☆☆☆ (1 star) – Detailed Review ...
3	Home Office	Office Supplies	18-25	{'uuid': 'bc3d8fbf-d69b-4ee5-a2c7-a0c1831d0ddc...	3	detailed	DeskFlow Boost	Rating: 3/5 Title: Solid but Unremarkable De...
4	Electronics	Cameras	25-35	{'uuid': '0d8281c9-de6a-411d-a8e3-6ce0963b346b...	1	rambling	VueMomentum 35mm ProCapture	I’m Rodney, 27, from Robinsonstad, South Dakot...

Python

1 # Load the analysis results into memory.
2 analysis = results.load_analysis()
3  
4 analysis.to_report()
5

Output

──────────────────────────────────────── 🎨 Data Designer Dataset Profile ─────────────────────────────────────────

                                                                                                                   
                                                 Dataset Overview                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ number of records               ┃ number of columns               ┃ percent complete records                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 10                              │ 8                               │ 100.0%                                      │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                🎲 Sampler Columns                                                 
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ column name                    ┃       data type ┃            number unique values ┃               sampler type ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ product_category               │          string │                       5 (50.0%) │                   category │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ product_subcategory            │          string │                       9 (90.0%) │                subcategory │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ target_age_range               │          string │                       5 (50.0%) │                   category │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ customer                       │            dict │                     10 (100.0%) │          person_from_faker │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ number_of_stars                │             int │                       4 (40.0%) │                    uniform │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ review_style                   │          string │                       3 (30.0%) │                   category │
└────────────────────────────────┴─────────────────┴─────────────────────────────────┴────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                📝 LLM-Text Columns                                                
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                       ┃               ┃                            ┃     prompt tokens ┃      completion tokens ┃
┃ column name           ┃     data type ┃       number unique values ┃        per record ┃             per record ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ product_name          │        string │                10 (100.0%) │      74.0 +/- 0.8 │            5.0 +/- 2.5 │
├───────────────────────┼───────────────┼────────────────────────────┼───────────────────┼────────────────────────┤
│ customer_review       │        string │                10 (100.0%) │      68.0 +/- 3.1 │         86.0 +/- 235.4 │
└───────────────────────┴───────────────┴────────────────────────────┴───────────────────┴────────────────────────┘
                                                                                                                   
                                                                                                                   
╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮
│                                                                                                                 │
│  1. All token statistics are based on a sample of max(1000, len(dataset)) records.                              │
│  2. Tokens are calculated using tiktoken's cl100k_base tokenizer.                                               │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                                                                                   
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

⏭️ Next Steps

Now that you've seen the basics of Data Designer, check out the following notebooks to learn more about: