🎨 NeMo Data Designer | NVIDIA NeMo Data Designer

👋 Welcome! Data Designer is an orchestration framework for generating high-quality synthetic data. You provide LLM endpoints (NVIDIA, OpenAI, vLLM, etc.), and Data Designer handles batching, parallelism, validation, and more.

Configure columns and models → Preview samples and iterate → Create your full dataset at scale.

Unlike raw LLM calls, Data Designer gives you statistical diversity, field correlations, automated validation, and reproducible workflows. For details, see Architecture & Performance.

📝 Want to hear from the team? Check out our Dev Notes for deep dives, best practices, and insights.

Install

$ pip install data-designer

Setup

Get an API key from one of the default providers and set it as an environment variable:

$ # NVIDIA (build.nvidia.com) - recommended
$ export NVIDIA_API_KEY="your-api-key-here"
$ 
$ # OpenAI (platform.openai.com)
$ export OPENAI_API_KEY="your-openai-api-key-here"
$ 
$ # OpenRouter (openrouter.ai)
$ export OPENROUTER_API_KEY="your-openrouter-api-key-here"

Verify your configuration is ready:

$ data-designer config list

This displays the pre-configured model providers and models. See CLI Configuration to customize.

Your First Dataset

Let’s generate multilingual greetings to see Data Designer in action:

1 import data_designer.config as dd
2 from data_designer.interface import DataDesigner
3 
4 # Initialize with default model providers
5 data_designer = DataDesigner()
6 config_builder = dd.DataDesignerConfigBuilder()
7 
8 # Add a sampler column to randomly select a language
9 config_builder.add_column(
10     dd.SamplerColumnConfig(
11         name="language",
12         sampler_type=dd.SamplerType.CATEGORY,
13         params=dd.CategorySamplerParams(
14             values=["English", "Spanish", "French", "German", "Italian"],
15         ),
16     )
17 )
18 
19 # Add an LLM text generation column
20 config_builder.add_column(
21     dd.LLMTextColumnConfig(
22         name="greeting",
23         model_alias="nvidia-text",
24         prompt="Write a casual and formal greeting in {{ language }}.",
25     )
26 )
27 
28 # Generate a preview
29 results = data_designer.preview(config_builder)
30 results.display_sample_record()

🎉 That’s it! You’ve just designed your first synthetic dataset.

🚀 Next Steps

Tutorials

Step-by-step notebooks covering core features

Recipes

Ready-to-use examples for common use cases

Concepts

Deep dive into columns, models, and configuration

Learn More

Deployment Options: Library vs. Microservice – Library vs. NeMo Microservice
Model Configuration – Configure LLM providers and models
Architecture & Performance – Optimize for throughput and scale