For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
  • Dev Notes
    • Overview
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
    • Data Designer Got Skills
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
Tutorials

The Basics

||View as Markdown|
Previous

Tutorials Overview

Next

Structured Outputs, Jinja Expressions, and Conditional Generation

▶Run in Google Colab

🎨 Data Designer Tutorial: The Basics

📚 What you'll learn

This notebook demonstrates the basics of Data Designer by generating a simple product review dataset.

📦 Import Data Designer

  • data_designer.config provides access to the configuration API.

  • DataDesigner is the main interface for data generation.

Python
1import data_designer.config as dd
2from data_designer.interface import DataDesigner
3

⚙️ Initialize the Data Designer interface

  • DataDesigner is the main object responsible for managing the data generation process.

  • When initialized without arguments, the default model providers are used.

Python
1data_designer = DataDesigner()
2

🎛️ Define model configurations

  • Each ModelConfig defines a model that can be used during the generation process.

  • The "model alias" is used to reference the model in the Data Designer config (as we will see below).

  • The "model provider" is the external service that hosts the model (see the model config docs for more details).

  • By default, we use build.nvidia.com as the model provider.

Python
1# This name is set in the model provider configuration.
2MODEL_PROVIDER = "nvidia"
3
4# The model ID is from build.nvidia.com.
5MODEL_ID = "nvidia/nemotron-3-nano-30b-a3b"
6
7# We choose this alias to be descriptive for our use case.
8MODEL_ALIAS = "nemotron-nano-v3"
9
10model_configs = [
11 dd.ModelConfig(
12 alias=MODEL_ALIAS,
13 model=MODEL_ID,
14 provider=MODEL_PROVIDER,
15 inference_parameters=dd.ChatCompletionInferenceParams(
16 temperature=1.0,
17 top_p=1.0,
18 max_tokens=2048,
19 extra_body={"chat_template_kwargs": {"enable_thinking": False}},
20 ),
21 )
22]
23

🏗️ Initialize the Data Designer Config Builder

  • The Data Designer config defines the dataset schema and generation process.

  • The config builder provides an intuitive interface for building this configuration.

  • The list of model configs is provided to the builder at initialization.

Python
1config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)
2

🎲 Getting started with sampler columns

  • Sampler columns offer non-LLM based generation of synthetic data.

  • They are particularly useful for steering the diversity of the generated data, as we demonstrate below.


You can view available samplers using the config builder's info property:

Python
1config_builder.info.display("samplers")
2
Output
─────────────────────────────────────────── NeMo Data Designer Samplers ───────────────────────────────────────────

┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Type               ┃ Parameter                ┃ Data Type                         ┃ Required ┃ Constraints      ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ bernoulli          │ p                        │ number                            │    ✓     │ >= 0.0, <= 1.0   │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ bernoulli_mixture  │ p                        │ number                            │    ✓     │ >= 0.0, <= 1.0   │
│                    │ dist_name                │ string                            │    ✓     │                  │
│                    │ dist_params              │ dict                              │    ✓     │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ binomial           │ n                        │ integer                           │    ✓     │                  │
│                    │ p                        │ number                            │    ✓     │ >= 0.0, <= 1.0   │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ category           │ values                   │ string[] | integer[] | number[]   │    ✓     │ len > 1          │
│                    │ weights                  │ number[] | null                   │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ datetime           │ start                    │ string                            │    ✓     │                  │
│                    │ end                      │ string                            │    ✓     │                  │
│                    │ unit                     │ string                            │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ gaussian           │ mean                     │ number                            │    ✓     │                  │
│                    │ stddev                   │ number                            │    ✓     │                  │
│                    │ decimal_places           │ integer | null                    │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ person             │ locale                   │ string                            │          │                  │
│                    │ sex                      │ string | null                     │          │                  │
│                    │ city                     │ string | string[] | null          │          │                  │
│                    │ age_range                │ integer[]                         │          │ len > 2, len < 2 │
│                    │ select_field_values      │ object | null                     │          │                  │
│                    │ with_synthetic_personas  │ boolean                           │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ person_from_faker  │ locale                   │ string                            │          │                  │
│                    │ sex                      │ string | null                     │          │                  │
│                    │ city                     │ string | string[] | null          │          │                  │
│                    │ age_range                │ integer[]                         │          │ len > 2, len < 2 │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ poisson            │ mean                     │ number                            │    ✓     │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ scipy              │ dist_name                │ string                            │    ✓     │                  │
│                    │ dist_params              │ dict                              │    ✓     │                  │
│                    │ decimal_places           │ integer | null                    │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ subcategory        │ category                 │ string                            │    ✓     │                  │
│                    │ values                   │ dict                              │    ✓     │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ timedelta          │ dt_min                   │ integer                           │    ✓     │ >= 0             │
│                    │ dt_max                   │ integer                           │    ✓     │ > 0              │
│                    │ reference_column_name    │ string                            │    ✓     │                  │
│                    │ unit                     │ string                            │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ uniform            │ low                      │ number                            │    ✓     │                  │
│                    │ high                     │ number                            │    ✓     │                  │
│                    │ decimal_places           │ integer | null                    │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
├────────────────────┼──────────────────────────┼───────────────────────────────────┼──────────┼──────────────────┤
│ uuid               │ prefix                   │ string | null                     │          │                  │
│                    │ short_form               │ boolean                           │          │                  │
│                    │ uppercase                │ boolean                           │          │                  │
│                    │ sampler_type             │ string                            │          │                  │
└────────────────────┴──────────────────────────┴───────────────────────────────────┴──────────┴──────────────────┘

Let's start designing our product review dataset by adding product category and subcategory columns.

Python
1config_builder.add_column(
2 dd.SamplerColumnConfig(
3 name="product_category",
4 sampler_type=dd.SamplerType.CATEGORY,
5 params=dd.CategorySamplerParams(
6 values=[
7 "Electronics",
8 "Clothing",
9 "Home & Kitchen",
10 "Books",
11 "Home Office",
12 ],
13 ),
14 )
15)
16
17config_builder.add_column(
18 dd.SamplerColumnConfig(
19 name="product_subcategory",
20 sampler_type=dd.SamplerType.SUBCATEGORY,
21 params=dd.SubcategorySamplerParams(
22 category="product_category",
23 values={
24 "Electronics": [
25 "Smartphones",
26 "Laptops",
27 "Headphones",
28 "Cameras",
29 "Accessories",
30 ],
31 "Clothing": [
32 "Men's Clothing",
33 "Women's Clothing",
34 "Winter Coats",
35 "Activewear",
36 "Accessories",
37 ],
38 "Home & Kitchen": [
39 "Appliances",
40 "Cookware",
41 "Furniture",
42 "Decor",
43 "Organization",
44 ],
45 "Books": [
46 "Fiction",
47 "Non-Fiction",
48 "Self-Help",
49 "Textbooks",
50 "Classics",
51 ],
52 "Home Office": [
53 "Desks",
54 "Chairs",
55 "Storage",
56 "Office Supplies",
57 "Lighting",
58 ],
59 },
60 ),
61 )
62)
63
64config_builder.add_column(
65 dd.SamplerColumnConfig(
66 name="target_age_range",
67 sampler_type=dd.SamplerType.CATEGORY,
68 params=dd.CategorySamplerParams(values=["18-25", "25-35", "35-50", "50-65", "65+"]),
69 )
70)
71
72# Optionally validate that the columns are configured correctly.
73data_designer.validate(config_builder)
74
Output
[21:15:31] [INFO] ✅ Validation passed

Next, let's add samplers to generate data related to the customer and their review.

Python
1config_builder.add_column(
2 dd.SamplerColumnConfig(
3 name="customer",
4 sampler_type=dd.SamplerType.PERSON_FROM_FAKER,
5 params=dd.PersonFromFakerSamplerParams(age_range=[18, 70], locale="en_US"),
6 )
7)
8
9config_builder.add_column(
10 dd.SamplerColumnConfig(
11 name="number_of_stars",
12 sampler_type=dd.SamplerType.UNIFORM,
13 params=dd.UniformSamplerParams(low=1, high=5),
14 convert_to="int", # Convert the sampled float to an integer.
15 )
16)
17
18config_builder.add_column(
19 dd.SamplerColumnConfig(
20 name="review_style",
21 sampler_type=dd.SamplerType.CATEGORY,
22 params=dd.CategorySamplerParams(
23 values=["rambling", "brief", "detailed", "structured with bullet points"],
24 weights=[1, 2, 2, 1],
25 ),
26 )
27)
28
29data_designer.validate(config_builder)
30
Output
[21:15:31] [INFO] ✅ Validation passed

🦜 LLM-generated columns

  • The real power of Data Designer comes from leveraging LLMs to generate text, code, and structured data.

  • When prompting the LLM, we can use Jinja templating to reference other columns in the dataset.

  • As we see below, nested json fields can be accessed using dot notation.

Python
1config_builder.add_column(
2 dd.LLMTextColumnConfig(
3 name="product_name",
4 prompt=(
5 "You are a helpful assistant that generates product names. DO NOT add quotes around the product name.\n\n"
6 "Come up with a creative product name for a product in the '{{ product_category }}' category, focusing "
7 "on products related to '{{ product_subcategory }}'. The target age range of the ideal customer is "
8 "{{ target_age_range }} years old. Respond with only the product name, no other text."
9 ),
10 model_alias=MODEL_ALIAS,
11 )
12)
13
14config_builder.add_column(
15 dd.LLMTextColumnConfig(
16 name="customer_review",
17 prompt=(
18 "You are a customer named {{ customer.first_name }} from {{ customer.city }}, {{ customer.state }}. "
19 "You are {{ customer.age }} years old and recently purchased a product called {{ product_name }}. "
20 "Write a review of this product, which you gave a rating of {{ number_of_stars }} stars. "
21 "The style of the review should be '{{ review_style }}'. "
22 "Respond with only the review, no other text."
23 ),
24 model_alias=MODEL_ALIAS,
25 )
26)
27
28data_designer.validate(config_builder)
29
Output
[21:15:32] [INFO] ✅ Validation passed

🔁 Iteration is key – preview the dataset!

  1. Use the preview method to generate a sample of records quickly.

  2. Inspect the results for quality and format issues.

  3. Adjust column configurations, prompts, or parameters as needed.

  4. Re-run the preview until satisfied.

Python
1preview = data_designer.preview(config_builder, num_records=2)
2
Output
[21:15:32] [INFO] 🧐 Preview generation in progress
[21:15:32] [INFO]   |-- 🔒 Jinja rendering engine: secure
[21:15:32] [INFO] ✅ Validation passed
[21:15:32] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph
[21:15:32] [INFO] 🩺 Running health checks for models...
[21:15:32] [INFO]   |-- 👀 Checking 'nvidia/nemotron-3-nano-30b-a3b' in provider named 'nvidia' for model alias 'nemotron-nano-v3'...
[21:15:33] [INFO]   |-- ✅ Passed!
[21:15:33] [INFO] ⚡ DATA_DESIGNER_ASYNC_ENGINE is enabled - using async task-queue preview
[21:15:33] [INFO] 📝 llm-text model config for column 'product_name'
[21:15:33] [INFO]   |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:33] [INFO]   |-- model alias: 'nemotron-nano-v3'
[21:15:33] [INFO]   |-- model provider: 'nvidia'
[21:15:33] [INFO]   |-- inference parameters:
[21:15:33] [INFO]   |  |-- generation_type=chat-completion
[21:15:33] [INFO]   |  |-- max_parallel_requests=4
[21:15:33] [INFO]   |  |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}
[21:15:33] [INFO]   |  |-- temperature=1.00
[21:15:33] [INFO]   |  |-- top_p=1.00
[21:15:33] [INFO]   |  |-- max_tokens=2048
[21:15:33] [INFO] 📝 llm-text model config for column 'customer_review'
[21:15:33] [INFO]   |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:33] [INFO]   |-- model alias: 'nemotron-nano-v3'
[21:15:33] [INFO]   |-- model provider: 'nvidia'
[21:15:33] [INFO]   |-- inference parameters:
[21:15:33] [INFO]   |  |-- generation_type=chat-completion
[21:15:33] [INFO]   |  |-- max_parallel_requests=4
[21:15:33] [INFO]   |  |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}
[21:15:33] [INFO]   |  |-- temperature=1.00
[21:15:33] [INFO]   |  |-- top_p=1.00
[21:15:33] [INFO]   |  |-- max_tokens=2048
[21:15:33] [INFO] ⚡️ Async generation: 2 column(s) (product_name, customer_review), 4 tasks across 1 row group(s)
[21:15:33] [INFO] 🚀 (1/1) Dispatching with 2 records
[21:15:33] [INFO] 🎲 (1/1) Preparing samplers to generate 2 records across 6 columns
[21:15:37] [INFO] 📊 Progress [3.9s]:
[21:15:37] [INFO]   |-- 🌕 product_name: 2/2 (100%) 0.5 rec/s
[21:15:37] [INFO]   |-- 🦁 customer_review: 2/2 (100%) 0.5 rec/s
[21:15:37] [INFO] ✅ Async generation complete [3.9s]: 4 ok, 0 failed across 2 column(s)
[21:15:37] [INFO] 📊 Model usage summary:
[21:15:37] [INFO]   |-- model: nvidia/nemotron-3-nano-30b-a3b
[21:15:37] [INFO]   |-- tokens: input=360, output=448, total=808, tps=204
[21:15:37] [INFO]   |-- requests: success=4, failed=0, total=4, rpm=60
[21:15:37] [INFO] 📐 Measuring dataset column statistics:
[21:15:37] [INFO]   |-- 🎲 column: 'product_category'
[21:15:37] [INFO]   |-- 🎲 column: 'product_subcategory'
[21:15:37] [INFO]   |-- 🎲 column: 'target_age_range'
[21:15:37] [INFO]   |-- 🎲 column: 'customer'
[21:15:37] [INFO]   |-- 🎲 column: 'number_of_stars'
[21:15:37] [INFO]   |-- 🎲 column: 'review_style'
[21:15:37] [INFO]   |-- 📝 column: 'product_name'
[21:15:37] [INFO]   |-- 📝 column: 'customer_review'
[21:15:37] [INFO] 🙌 Preview complete!
Python
1# Run this cell multiple times to cycle through the 2 preview records.
2preview.display_sample_record()
3
Output
[index: 0]
                                                                                                              
                                              Generated Columns                                               
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name                ┃ Value                                                                                ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ product_category    │ Home Office                                                                          │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ product_subcategory │ Chairs                                                                               │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ target_age_range    │ 18-25                                                                                │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ customer            │ {                                                                                    │
│                     │     'uuid': '22a58e4d-938c-4c7a-8905-00b6b659c589',                                  │
│                     │     'locale': 'en_US',                                                               │
│                     │     'first_name': 'Brittany',                                                        │
│                     │     'last_name': 'Tran',                                                             │
│                     │     'middle_name': None,                                                             │
│                     │     'sex': 'Female',                                                                 │
│                     │     'street_number': '43048',                                                        │
│                     │     'street_name': 'Deborah Stream',                                                 │
│                     │     'city': 'East Timothy',                                                          │
│                     │     'state': 'Virginia',                                                             │
│                     │     'postcode': '99870',                                                             │
│                     │     'age': 60,                                                                       │
│                     │     'birth_date': '1966-05-04',                                                      │
│                     │     'country': 'Eritrea',                                                            │
│                     │     'marital_status': 'married_present',                                             │
│                     │     'education_level': 'some_college',                                               │
│                     │     'unit': '',                                                                      │
│                     │     'occupation': 'Teacher, music',                                                  │
│                     │     'phone_number': '001-286-878-1827',                                              │
│                     │     'bachelors_field': 'no_degree'                                                   │
│                     │ }                                                                                    │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ number_of_stars     │ 3                                                                                    │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ review_style        │ brief                                                                                │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ product_name        │ ErgoPulse ChairMate Mini                                                             │
├─────────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│ customer_review     │ I’m 60, live in East Timothy, VA, and just bought the ErgoPulse ChairMate Mini. It’s │
│                     │ compact and the lumbar support feels decent, but the cushion is a bit firm for my    │
│                     │ taste and the armrests don’t adjust much. Worth a try if you need a small,           │
│                     │ supportive chair, but I expected a little more comfort for the price. Rating: 3      │
│                     │ stars.                                                                               │
└─────────────────────┴──────────────────────────────────────────────────────────────────────────────────────┘
                                                                                                              
Python
1# The preview dataset is available as a pandas DataFrame.
2preview.dataset
3
Output
product_category product_subcategory target_age_range customer number_of_stars review_style product_name customer_review
0 Home Office Chairs 18-25 {'uuid': '22a58e4d-938c-4c7a-8905-00b6b659c589... 3 brief ErgoPulse ChairMate Mini I’m 60, live in East Timothy, VA, and just bou...
1 Home Office Chairs 50-65 {'uuid': 'd3f3069f-ff15-43ba-bc35-87a7b625c4fc... 2 detailed ErgoLux Adjustable Lumbar Home Office Chair I bought the ErgoLux Adjustable Lumbar Home Of...

📊 Analyze the generated data

  • Data Designer automatically generates a basic statistical analysis of the generated data.

  • This analysis is available via the analysis property of generation result objects.

Python
1# Print the analysis as a table.
2preview.analysis.to_report()
3
Output
──────────────────────────────────────── 🎨 Data Designer Dataset Profile ─────────────────────────────────────────

                                                                                                                   
                                                 Dataset Overview                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ number of records               ┃ number of columns               ┃ percent complete records                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 2                               │ 8                               │ 100.0%                                      │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                🎲 Sampler Columns                                                 
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ column name                    ┃       data type ┃            number unique values ┃               sampler type ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ product_category               │          string │                       1 (50.0%) │                   category │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ product_subcategory            │          string │                       1 (50.0%) │                subcategory │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ target_age_range               │          string │                      2 (100.0%) │                   category │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ customer                       │            dict │                      2 (100.0%) │          person_from_faker │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ number_of_stars                │             int │                      2 (100.0%) │                    uniform │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ review_style                   │          string │                      2 (100.0%) │                   category │
└────────────────────────────────┴─────────────────┴─────────────────────────────────┴────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                📝 LLM-Text Columns                                                
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                       ┃               ┃                            ┃     prompt tokens ┃      completion tokens ┃
┃ column name           ┃     data type ┃       number unique values ┃        per record ┃             per record ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ product_name          │        string │                 2 (100.0%) │      74.0 +/- 0.0 │            9.0 +/- 1.4 │
├───────────────────────┼───────────────┼────────────────────────────┼───────────────────┼────────────────────────┤
│ customer_review       │        string │                 2 (100.0%) │      72.0 +/- 2.0 │        204.5 +/- 170.4 │
└───────────────────────┴───────────────┴────────────────────────────┴───────────────────┴────────────────────────┘
                                                                                                                   
                                                                                                                   
╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮
│                                                                                                                 │
│  1. All token statistics are based on a sample of max(1000, len(dataset)) records.                              │
│  2. Tokens are calculated using tiktoken's cl100k_base tokenizer.                                               │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                                                                                   
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

🆙 Scale up!

  • Happy with your preview data?

  • Use the create method to submit larger Data Designer generation jobs.

Python
1results = data_designer.create(config_builder, num_records=10, dataset_name="tutorial-1")
2
Output
[21:15:37] [INFO] 🎨 Creating Data Designer dataset
[21:15:37] [INFO]   |-- 🔒 Jinja rendering engine: secure
[21:15:37] [INFO] ✅ Validation passed
[21:15:37] [INFO] ⛓️ Sorting column configs into a Directed Acyclic Graph
[21:15:37] [INFO] 🩺 Running health checks for models...
[21:15:37] [INFO]   |-- 👀 Checking 'nvidia/nemotron-3-nano-30b-a3b' in provider named 'nvidia' for model alias 'nemotron-nano-v3'...
[21:15:38] [INFO]   |-- ✅ Passed!
[21:15:38] [INFO] ⚡ DATA_DESIGNER_ASYNC_ENGINE is enabled - using async task-queue builder
[21:15:38] [INFO] 📝 llm-text model config for column 'product_name'
[21:15:38] [INFO]   |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:38] [INFO]   |-- model alias: 'nemotron-nano-v3'
[21:15:38] [INFO]   |-- model provider: 'nvidia'
[21:15:38] [INFO]   |-- inference parameters:
[21:15:38] [INFO]   |  |-- generation_type=chat-completion
[21:15:38] [INFO]   |  |-- max_parallel_requests=4
[21:15:38] [INFO]   |  |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}
[21:15:38] [INFO]   |  |-- temperature=1.00
[21:15:38] [INFO]   |  |-- top_p=1.00
[21:15:38] [INFO]   |  |-- max_tokens=2048
[21:15:38] [INFO] 📝 llm-text model config for column 'customer_review'
[21:15:38] [INFO]   |-- model: 'nvidia/nemotron-3-nano-30b-a3b'
[21:15:38] [INFO]   |-- model alias: 'nemotron-nano-v3'
[21:15:38] [INFO]   |-- model provider: 'nvidia'
[21:15:38] [INFO]   |-- inference parameters:
[21:15:38] [INFO]   |  |-- generation_type=chat-completion
[21:15:38] [INFO]   |  |-- max_parallel_requests=4
[21:15:38] [INFO]   |  |-- extra_body={'chat_template_kwargs': {'enable_thinking': False}}
[21:15:38] [INFO]   |  |-- temperature=1.00
[21:15:38] [INFO]   |  |-- top_p=1.00
[21:15:38] [INFO]   |  |-- max_tokens=2048
[21:15:38] [INFO] ⚡️ Async generation: 2 column(s) (product_name, customer_review), 20 tasks across 1 row group(s)
[21:15:38] [INFO] 🚀 (1/1) Dispatching with 10 records
[21:15:38] [INFO] 🎲 (1/1) Preparing samplers to generate 10 records across 6 columns
[21:15:43] [INFO] 📊 Progress [4.8s]:
[21:15:43] [INFO]   |-- 🐔 product_name: 10/10 (100%) 2.1 rec/s
[21:15:43] [INFO]   |-- ☀️ customer_review: 10/10 (100%) 2.1 rec/s
[21:15:43] [INFO] ✅ Async generation complete [4.8s]: 20 ok, 0 failed across 2 column(s)
[21:15:43] [INFO] 📊 Model usage summary:
[21:15:43] [INFO]   |-- model: nvidia/nemotron-3-nano-30b-a3b
[21:15:43] [INFO]   |-- tokens: input=1777, output=3136, total=4913, tps=984
[21:15:43] [INFO]   |-- requests: success=20, failed=0, total=20, rpm=240
[21:15:43] [INFO] 📐 Measuring dataset column statistics:
[21:15:43] [INFO]   |-- 🎲 column: 'product_category'
[21:15:43] [INFO]   |-- 🎲 column: 'product_subcategory'
[21:15:43] [INFO]   |-- 🎲 column: 'target_age_range'
[21:15:43] [INFO]   |-- 🎲 column: 'customer'
[21:15:43] [INFO]   |-- 🎲 column: 'number_of_stars'
[21:15:43] [INFO]   |-- 🎲 column: 'review_style'
[21:15:43] [INFO]   |-- 📝 column: 'product_name'
[21:15:43] [INFO]   |-- 📝 column: 'customer_review'
Python
1# Load the generated dataset as a pandas DataFrame.
2dataset = results.load_dataset()
3
4dataset.head()
5
Output
product_category product_subcategory target_age_range customer number_of_stars review_style product_name customer_review
0 Clothing Women's Clothing 18-25 {'age': 42, 'bachelors_field': 'education', 'b... 2 structured with bullet points Luna Threads™ Cardigan Set **Luna Threads™ Cardigan Set – 2‑Star Review**...
1 Home & Kitchen Furniture 35-50 {'age': 59, 'bachelors_field': 'education', 'b... 3 brief Aurora Modular Sofa I love the Aurora Modular Sofa's modern look a...
2 Books Fiction 65+ {'age': 26, 'bachelors_field': 'stem', 'birth_... 4 detailed Timeless Tales Emporium I purchased the Timeless Tales Emporium six we...
3 Home & Kitchen Decor 65+ {'age': 27, 'bachelors_field': 'no_degree', 'b... 2 structured with bullet points Sunlit Heritage Wall Clock - **Purchased:** Sunlit Heritage Wall Clock – ...
4 Books Classics 50-65 {'age': 32, 'bachelors_field': 'no_degree', 'b... 1 detailed The Golden Quill Classic Collection I’m Nicholas from South Katherine, Ohio, and I...
Python
1# Load the analysis results into memory.
2analysis = results.load_analysis()
3
4analysis.to_report()
5
Output
──────────────────────────────────────── 🎨 Data Designer Dataset Profile ─────────────────────────────────────────

                                                                                                                   
                                                 Dataset Overview                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ number of records               ┃ number of columns               ┃ percent complete records                    ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 10                              │ 8                               │ 100.0%                                      │
└─────────────────────────────────┴─────────────────────────────────┴─────────────────────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                🎲 Sampler Columns                                                 
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ column name                    ┃       data type ┃            number unique values ┃               sampler type ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ product_category               │          string │                       4 (40.0%) │                   category │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ product_subcategory            │          string │                       7 (70.0%) │                subcategory │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ target_age_range               │          string │                       5 (50.0%) │                   category │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ customer                       │            dict │                     10 (100.0%) │          person_from_faker │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ number_of_stars                │             int │                       4 (40.0%) │                    uniform │
├────────────────────────────────┼─────────────────┼─────────────────────────────────┼────────────────────────────┤
│ review_style                   │          string │                       4 (40.0%) │                   category │
└────────────────────────────────┴─────────────────┴─────────────────────────────────┴────────────────────────────┘
                                                                                                                   
                                                                                                                   
                                                📝 LLM-Text Columns                                                
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                       ┃               ┃                            ┃     prompt tokens ┃      completion tokens ┃
┃ column name           ┃     data type ┃       number unique values ┃        per record ┃             per record ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
│ product_name          │        string │                10 (100.0%) │      74.0 +/- 1.0 │            5.5 +/- 1.2 │
├───────────────────────┼───────────────┼────────────────────────────┼───────────────────┼────────────────────────┤
│ customer_review       │        string │                10 (100.0%) │      70.0 +/- 1.9 │        275.0 +/- 180.5 │
└───────────────────────┴───────────────┴────────────────────────────┴───────────────────┴────────────────────────┘
                                                                                                                   
                                                                                                                   
╭────────────────────────────────────────────────── Table Notes ──────────────────────────────────────────────────╮
│                                                                                                                 │
│  1. All token statistics are based on a sample of max(1000, len(dataset)) records.                              │
│  2. Tokens are calculated using tiktoken's cl100k_base tokenizer.                                               │
│                                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
                                                                                                                   
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

⏭️ Next Steps

Now that you've seen the basics of Data Designer, check out the following notebooks to learn more about:

  • Structured outputs, jinja expressions, and conditional generation

  • Seeding synthetic data generation with an external dataset

  • Providing images as context

  • Generating images