> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/datadesigner/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/datadesigner/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/datadesigner/_mcp/server.

# 🎨 NeMo Data Designer

👋 Welcome! Data Designer is an orchestration framework for generating high-quality synthetic data. You provide LLM endpoints (NVIDIA, OpenAI, vLLM, etc.), and Data Designer handles batching, parallelism, validation, and more.

**Configure** columns and models → **Preview** samples and iterate → **Create** your full dataset at scale.

Unlike raw LLM calls, Data Designer gives you statistical diversity, field correlations, automated validation, and reproducible workflows. For details, see [Architecture & Performance](/concepts/architecture-and-performance).

📝 Want to hear from the team? Check out our **[Dev Notes](/dev-notes/overview)** for deep dives, best practices, and insights.

## Install

```bash
pip install data-designer
```

## Setup

Get an API key from one of the default providers and set it as an environment variable:

```bash
# NVIDIA (build.nvidia.com) - recommended
export NVIDIA_API_KEY="your-api-key-here"

# OpenAI (platform.openai.com)
export OPENAI_API_KEY="your-openai-api-key-here"

# OpenRouter (openrouter.ai)
export OPENROUTER_API_KEY="your-openrouter-api-key-here"
```

Verify your configuration is ready:

```bash
data-designer config list
```

This displays the pre-configured model providers and models. See [CLI Configuration](/concepts/models/configure-with-the-cli) to customize.

## Your First Dataset

Let's generate multilingual greetings to see Data Designer in action:

```python
import data_designer.config as dd
from data_designer.interface import DataDesigner

# Initialize with default model providers
data_designer = DataDesigner()
config_builder = dd.DataDesignerConfigBuilder()

# Add a sampler column to randomly select a language
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="language",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(
            values=["English", "Spanish", "French", "German", "Italian"],
        ),
    )
)

# Add an LLM text generation column
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="greeting",
        model_alias="nvidia-text",
        prompt="Write a casual and formal greeting in {{ language }}.",
    )
)

# Generate a preview
results = data_designer.preview(config_builder)
results.display_sample_record()
```

🎉 That's it! You've just designed your first synthetic dataset.

## 🚀 Next Steps

Step-by-step notebooks covering core features

Ready-to-use examples for common use cases

Deep dive into columns, models, and configuration

## Learn More

* **[Deployment Options: Library vs. Microservice](/concepts/deployment-options)** – Library vs. NeMo Microservice
* **[Model Configuration](/concepts/models/default-model-settings)** – Configure LLM providers and models
* **[Architecture & Performance](/concepts/architecture-and-performance)** – Optimize for throughput and scale