Recipes are a collection of code examples that demonstrate how to leverage Data Designer in specific use cases. Each recipe is a self-contained example that can be run independently.
Recipes provide working code for specific use cases without detailed explanations. If you’re learning Data Designer for the first time, start with our tutorial notebooks, which offer step-by-step guidance and explain core concepts. Once you’re familiar with the basics, return here for practical, ready-to-use implementations.
These recipes use the OpenAI model provider by default. Ensure your OpenAI provider is set up via the Data Designer CLI before running a recipe.
Natural-language instructions paired with Python implementations across complexity levels and industries.
Python code generation · validation · LLM-as-judge
Natural-language instructions paired with SQL implementations across complexity levels and industries.
SQL code generation · validation · LLM-as-judge
Enterprise-grade text-to-SQL training data — dialect-specific SQL, distractor injection, dirty data, 5 LLM judges with 15 scoring dimensions.
Multi-dialect SQL · SubcategorySamplerParams · 5 judges · 15 score columns
Product information paired with question/answer pairs.
Structured outputs · expression columns · LLM-as-judge
Multi-turn chat conversations between a user and an AI assistant.
Structured outputs · expression columns · LLM-as-judge
Minimal example of MCP tool calling — defines a simple MCP server and generates data that requires tool calls to complete.
LocalStdioMCPProvider · simple tool server · tool-augmented text
Grounded Q&A pairs from PDF documents using MCP tool calls and BM25 search.
LocalStdioMCPProvider · BM25 retrieval · per-column trace capture
Multi-turn search agent trajectories — Tavily web search via MCP, Wikidata KG seeding, BrowseComp-style question generation.
Tavily MCP · Wikidata seeding · two-stage question generation · trajectory capture
A 9-recipe pipeline for generating visual QA training data from long PDF documents: OCR, page classification, single-page / multi-page / whole-document QA, and frontier-model quality filtering.
Download PDFs, render page images, and prepare seed datasets for the downstream VLM recipes.
Run Nemotron Parse over document pages and save OCR transcripts for text-based QA generation.
Generate text-grounded question-answer pairs from OCR transcripts.
Classify pages by visual reasoning potential before running more expensive QA generation.
Generate visual question-answer pairs from classified page images.
Generate single-page VLM QA examples from page-level image seeds.
Generate cross-page QA examples over fixed-size page windows.
Generate document-level QA examples over grouped page images.
Score and filter generated QA pairs with a stronger independent judge.