Use Case Recipes | NVIDIA NeMo Data Designer

Recipes are a collection of code examples that demonstrate how to leverage Data Designer in specific use cases. Each recipe is a self-contained example that can be run independently.

New to Data Designer?

Recipes provide working code for specific use cases without detailed explanations. If you’re learning Data Designer for the first time, start with our tutorial notebooks, which offer step-by-step guidance and explain core concepts. Once you’re familiar with the basics, return here for practical, ready-to-use implementations.

Prerequisites

Most recipes use the OpenAI model provider by default. Ensure your OpenAI provider is set up via the Data Designer CLI before running model-backed recipes. Image-generation recipes use OpenRouter and Gemini image models by default, so set OPENROUTER_API_KEY before running them unless you override the model provider and model ID. Headless recipes, such as Document Review Gate, do not call a model provider.

Image Generation

Rich Document Images

Generate synthetic business-document page images with controlled metadata for VQA, OCR, multimodal judging, and document-understanding workflows.

Image generation · visual seed data · VQA-ready parquet export

Product Image Variations

Use image-to-image generation to create inclusive adult apparel catalog variants across age groups, ethnicities, body types, poses, and styling contexts.

Image-to-image · apparel catalog · inclusive representation

Funny Pet Image Edits

Generate synthetic dog and cat photos, then use image-to-image generation to make the same pet scene funnier while preserving identity.

Image-to-image · creative review · identity preservation

Traffic Scenarios

Generate self-driving car ego-camera scenes with controlled road, weather, lighting, traffic, and long-tail hazard variation.

AV ego camera · edge cases · visual review sets

Synthetic Extremity X-rays

Generate research-only extremity X-ray style images with controlled anatomy, acquisition, finding, and quality metadata.

Research only · visual QA · report generation

Airport Baggage Screening

Generate defensive baggage-screening style images with controlled clutter, material mix, scanner style, and review labels.

Defensive evaluation · human review · scanner-like images

Humanoid Robot Scene Understanding

Generate egocentric humanoid robot scenes with controlled environment, viewpoint, task, object, safety, lighting, and human-presence metadata.

Embodied AI · scene understanding · safety review

Crop Disease Detection Images

Generate crop disease detection images with controlled crop, growth stage, viewpoint, condition, severity, and field context.

Crop disease detection · healthy negatives · reviewer calibration

Drone Aerial Inspection

Generate low-altitude drone inspection images for infrastructure, property, construction, disaster-response, and industrial review workflows.

Drone inspection · infrastructure QA · reviewer calibration

Code Generation

Text to Python

Natural-language instructions paired with Python implementations across complexity levels and industries.

Python code generation · validation · LLM-as-judge

Text to SQL

Natural-language instructions paired with SQL implementations across complexity levels and industries.

SQL code generation · validation · LLM-as-judge

Nemotron Super Text to SQL

Enterprise-grade text-to-SQL training data — dialect-specific SQL, distractor injection, dirty data, 5 LLM judges with 15 scoring dimensions.

Multi-dialect SQL · SubcategorySamplerParams · 5 judges · 15 score columns

QA and Chat

Product Info QA

Product information paired with question/answer pairs.

Structured outputs · expression columns · LLM-as-judge

Multi-Turn Chat

Multi-turn chat conversations between a user and an AI assistant.

Structured outputs · expression columns · LLM-as-judge

Trace Ingestion

Agent Rollout Trace Distillation

Read agent rollout traces from disk and turn each one into a structured workflow record inside a Data Designer pipeline. See the ingestion guide for the trace format.

AgentRolloutSeedSource · ATIF, Claude Code, Codex, Hermes formats · trace-aware prompts

Workflow Chaining

Document Review Gate

Run a workflow to a named review stage, export that intermediate dataset, and resume downstream from a reviewed artifact.

Workflow chaining · stage export · stage output override

MCP and Tool Use

Basic MCP Tool Use

Minimal example of MCP tool calling — defines a simple MCP server and generates data that requires tool calls to complete.

LocalStdioMCPProvider · simple tool server · tool-augmented text

PDF Document QA

Grounded Q&A pairs from PDF documents using MCP tool calls and BM25 search.

LocalStdioMCPProvider · BM25 retrieval · per-column trace capture

Nemotron Super Search Agent

Multi-turn search agent trajectories — Tavily web search via MCP, Wikidata KG seeding, BrowseComp-style question generation.

Tavily MCP · Wikidata seeding · two-stage question generation · trajectory capture

Plugin Development

Markdown Section Seed Reader

Define a custom FileSystemSeedReader inline and turn Markdown files into one seed row per heading section.

Single-file custom reader · hydrate_row() fanout · DirectorySeedSource customization

VLM Long-Document Understanding

A 9-recipe pipeline for generating visual QA training data from long PDF documents: OCR, page classification, single-page / multi-page / whole-document QA, and frontier-model quality filtering.

Seed Dataset Preparation

Download PDFs, render page images, and prepare seed datasets for the downstream VLM recipes.

Nemotron Parse OCR

Run Nemotron Parse over document pages and save OCR transcripts for text-based QA generation.

Text QA from OCR Transcripts

Generate text-grounded question-answer pairs from OCR transcripts.

Page Classification

Classify pages by visual reasoning potential before running more expensive QA generation.

Visual QA

Generate visual question-answer pairs from classified page images.

Single-Page QA

Generate single-page VLM QA examples from page-level image seeds.

Multi-Page Windowed QA

Generate cross-page QA examples over fixed-size page windows.

Whole-Document QA

Generate document-level QA examples over grouped page images.

Frontier Judge QA Filter

Score and filter generated QA pairs with a stronger independent judge.