About Designing Synthetic Data From Scratch or Seeds#
Important
NVIDIA NeMo Data Designer is released with early access availability and is subject to limited support and potential API changes in future releases.
NeMo Data Designer is purpose-built for AI developers to design high-quality, domain-specific synthetic data at scale–unlike one-size-fits-all LLMs that struggle to deliver consistent, reliable results. You can start from scratch or from your own seed datasets to accelerate AI development with greater accuracy and performance.
Getting started with Data Designer requires the following:
Data Designer deployed on your laptop or compute instance.
The NeMo Microservices SDK installed with the
data-designer
extra option (pip install nemo-microservices[data-designer]
).Connectivity to models that are available via API or deployed in the same environment as Data Designer.
Note
If you already have a dataset and want to remove PII from it or use differential privacy to create a synthetic version of your dataset, refer to About Generating Private Synthetic Data. The private synthetic data service provides enhanced security features for sensitive datasets.

Synthetic Data Generation Workflow#
Once you have access to a deployment of the NeMo Data Designer microservice, the synthetic data generation workflow consists of the following steps:
Configure the models you want to use for Synthetic Data Generation (SDG)
Configure the seed datasets and columns you want to use to diversify your dataset.
Configure your LLM generated columns with prompts and structured outputs.
Preview your dataset and iterate on your configuration.
Installation Options#
Try out this beta microservice using Docker Compose or deploying the NeMo Microservices Helm chart.
Deploy the NeMo Data Designer microservice using Docker. Easiest for local testing.
Deploy the NeMo Microservices Helm Chart, which includes NeMo Data Designer.
Task Guides#
Follow the synthetic data generation workflow from model setup to data production.
Tip
The tutorials reference a NEMO_MICROSERVICES_BASE_URL
whose value will depend on the ingress in your particular cluster. If you are using the minikube demo installation, it will be http://nemo.test
. Otherwise, you will need to consult with your own cluster administrator for the ingress values.
NEMO_MICROSERVICES_BASE_URL
Set up AI models for synthetic data generation. Connect to NVIDIA-hosted models, manage model aliases, and tune the default inference parameters.
Create column definitions with various data types, constraints, and LLM-generated content using prompts templates and structured outputs.
Seed the SDG process with existing datasets to steer the content and diversity of the generated data.
Create synthetic person entities with demographics, personality traits, and synthetic personas for comprehensive character modeling.
Create synthetic datasets at scale using jobs, preview generations, and manage the data production process.
Validate and evaluate your synthetic data quality using automated checks and assessment metrics.
References#
Explore advanced configuration management, examples, and learning resources.
Save, load, and manage Data Designer configurations for reproducible synthetic data workflows.