π¨ NeMo Data Designer
π¨ NeMo Data Designer
π¨ NeMo Data Designer
π Welcome! Data Designer is an orchestration framework for generating high-quality synthetic data. You provide LLM endpoints (NVIDIA, OpenAI, vLLM, etc.), and Data Designer handles batching, parallelism, validation, and more.
Configure columns and models β Preview samples and iterate β Create your full dataset at scale.
Unlike raw LLM calls, Data Designer gives you statistical diversity, field correlations, automated validation, and reproducible workflows. For details, see Architecture & Performance.
π Want to hear from the team? Check out our Dev Notes for deep dives, best practices, and insights.
Get an API key from one of the default providers and set it as an environment variable:
Verify your configuration is ready:
This displays the pre-configured model providers and models. See CLI Configuration to customize.
Letβs generate multilingual greetings to see Data Designer in action:
π Thatβs it! Youβve just designed your first synthetic dataset.
Step-by-step notebooks covering core features
Ready-to-use examples for common use cases
Deep dive into columns, models, and configuration