π¨ NeMo Data Designer
π¨ NeMo Data Designer
π Welcome! Data Designer is an orchestration framework for generating high-quality synthetic data. You provide LLM endpoints (NVIDIA, OpenAI, vLLM, etc.), and Data Designer handles batching, parallelism, validation, and more.
Configure columns and models β Preview samples and iterate β Create your full dataset at scale.
Unlike raw LLM calls, Data Designer gives you statistical diversity, field correlations, automated validation, and reproducible workflows. For details, see Architecture & Performance.
π Want to hear from the team? Check out our Dev Notes for deep dives, best practices, and insights.
Install
Setup
Get an API key from one of the default providers and set it as an environment variable:
Verify your configuration is ready:
This displays the pre-configured model providers and models. See CLI Configuration to customize.
Your First Dataset
Letβs generate multilingual greetings to see Data Designer in action:
π Thatβs it! Youβve just designed your first synthetic dataset.
π Next Steps
Step-by-step notebooks covering core features
Ready-to-use examples for common use cases
Deep dive into columns, models, and configuration
Learn More
- Deployment Options: Library vs. Microservice β Library vs. NeMo Microservice
- Model Configuration β Configure LLM providers and models
- Architecture & Performance β Optimize for throughput and scale