About Generating Synthetic Data#
Important
NVIDIA NeMo Data Designer is released with early access availability and is subject to limited support and potential API changes in future releases.
The Data Designer Early Access release is available only via Docker Compose and is not yet part of the NeMo Microservices Platform Helm Chart for Kubernetes deployment.
NeMo Data Designer is purpose-built for AI developers to design high-quality, domain-specific synthetic data at scale–unlike one-size-fits-all LLMs that struggle to deliver consistent, reliable results. You can start from scratch or from your own seed datasets to accelerate AI development with greater accuracy and performance.
Getting started with Data Designer requires the following:
Deploy Data Designer on your laptop or compute instance.
Install the NeMo Microservices SDK with the
data-designer
extra option.Connect to models that are available via API or deployed in the same environment as Data Designer.
Start generating synthetic data.

Synthetic Data Generation Workflow#
Once you have access to a deployment of the NeMo Data Designer microservice, the synthetic data generation workflow consists of the following steps:
Configure the models you want to use for Synthetic Data Generation (SDG)
Configure the seed datasets and columns you want to use to diversify your dataset.
Configure your LLM generated columns with prompts and structured outputs.
Preview your dataset and iterate on your configuration.
Installation Options#
Try out this beta microservice using Docker Compose.
Deploy the NeMo Data Designer microservice using Docker. Easiest for local testing.
Task Guides#
Follow the synthetic data generation workflow from model setup to data production.
Set up AI models for synthetic data generation. Connect to NVIDIA-hosted models, manage model aliases, and tune the default inference parameters.
Create column definitions with various data types, constraints, and LLM-generated content using prompts templates and structured outputs.
Seed the SDG process with existing datasets to steer the content and diversity of the generated data.
Create synthetic person entities with demographics, personality traits, and synthetic personas for comprehensive character modeling.
Create synthetic datasets at scale using jobs, preview generations, and manage the data production process.
Validate and evaluate your synthetic data quality using automated checks and assessment metrics.
References#
Explore advanced configuration management, examples, and learning resources.
Save, load, and manage Data Designer configurations for reproducible synthetic data workflows.