For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
  • Dev Notes
    • Overview
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
    • Data Designer Got Skills
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
Dev Notes

Dev Notes

||View as Markdown|
Previous

Column Generator API

Next

Push Datasets to Hugging Face Hub

Welcome to NeMo Data Designer Dev Notes — in-depth guides, benchmark write-ups, and insights from the team building NeMo Data Designer.

Apr 16, 2026

Push Datasets to Hugging Face Hub

Call .push_to_hub() and ship a generated dataset straight to a live HF dataset card. Done and dusted.

Nabin Mulepati +1
Apr 14, 2026

Engineering an Enterprise-Grade Text-to-SQL Dataset

A pipeline with conditional sampling, three-stage LLM generation, code validators, and judge scoring — boosting Nemotron Super on BIRD from 26.77 → 41.80.

Dhruv Nathawani +2
Apr 2, 2026

Async All the Way Down

How async dispatch in the engine cuts wall time across deep dependency pipelines — same config, same prompts, 1.3× faster on average.

Andre Manoel +3
Mar 25, 2026

Owning the Model Stack

Adaptive concurrency, throttle keying, retry boundaries — owning the whole model client to discover provider capacity at runtime.

Nabin Mulepati
Mar 24, 2026

Data Designer Got Skills

A CLI and skill workflow that lets agents drive Data Designer end-to-end — leaner context, fewer tool calls, the same output.

Johnny Greco
Mar 12, 2026

Search Agent SFT Data

Multi-turn search agent trajectories for Nemotron Super post-training — Tavily web search, Wikidata KG seeding, BrowseComp-style obfuscation.

Dhruv Nathawani
Feb 18, 2026

Structured Outputs from Nemotron

Schema-constrained outputs across CSV / JSON / TOML / XML / YAML — JSONSchemaBench and StructEval-Text results, plus the recipe.

Dhruv Nathawani
D
Feb 10, 2026

Deep Research Trajectories

MCP tool-use trajectories for training deep research agents — search, open, find, answer over a static BM25 corpus, no web APIs needed.

Eric Tramel
Feb 10, 2026

Designing Data Designer

Why SDG is a systems problem, and the design principles behind a composable orchestration framework — declarative columns, imperative engine.

Kirit Thadaka
Feb 4, 2026

Graduate-Level Science Reasoning (RQA)

A massive collection of graduate-level reasoning samples seeded from Common Crawl — improves Nemotron 3 Nano on MMLU-Pro, Math 500, GSM8K.

Dane Corneil +1