For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • data_designer
      • results
      • errors
      • Interface API
        • Data Designer
        • Errors
        • Results
  • Dev Notes
    • Overview
    • Prompt Sensitivity
    • Retriever SDG Toolkit
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Classes
  • Functions
  • Data
  • API
Code ReferenceInterfaceInterface API

data_designer.interface.results

||View as Markdown|
Previous

Errors

Next

Engine Extension API

Module Contents

Classes

NameDescription
DatasetCreationResultsResults container for a Data Designer dataset creation run.

Functions

NameDescription
_export_jsonlWrite batch_files to output as JSONL, one record per line.
_export_csvWrite batch_files to output as CSV with a single header row.
_export_parquetWrite batch_files to output as a single Parquet file.

Data

ExportFormat SUPPORTED_EXPORT_FORMATS

API

1ExportFormat
SUPPORTED_EXPORT_FORMATS
tuple[str, ...]Defaults to get_args(...)
1class data_designer.interface.results.DatasetCreationResults(
2 *,
3 artifact_storage: data_designer.engine.storage.artifact_storage.ArtifactStorage,
4 analysis: data_designer.config.analysis.dataset_profiler.DatasetProfilerResults,
5 config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder,
6 dataset_metadata: data_designer.config.dataset_metadata.DatasetMetadata,
7 task_traces: list[data_designer.engine.dataset_builders.utils.task_model.TaskTrace] | None = None
8)

Bases: data_designer.config.utils.visualization.WithRecordSamplerMixin

Results container for a Data Designer dataset creation run.

This class provides access to the generated dataset, profiling analysis, and visualization utilities. It is returned by the DataDesigner.create() method and implements ResultsProtocol of the DataDesigner interface.

Resume scope: methods that read from the artifact directory (load_dataset, count_records, load_analysis, export, push_to_hub) reflect the full dataset on disk, including rows produced by earlier create() calls that the current invocation resumed. Per-run observability — task_traces and any model-usage / telemetry side effects emitted during the call — is scoped to the current invocation only, because the original run’s in-memory state is not persisted across process boundaries.

Initialization:

Creates a new instance with results based on a dataset creation run.

Parameters:

artifact_storage
data_designer.engine.storage.artifact_storage.ArtifactStorage

Storage manager for accessing generated artifacts.

analysis
data_designer.config.analysis.dataset_profiler.DatasetProfilerResults

Profiling results for the generated dataset.

config_builder
data_designer.config.config_builder.DataDesignerConfigBuilder

Configuration builder used to create the dataset.

dataset_metadata
data_designer.config.dataset_metadata.DatasetMetadata

Metadata about the generated dataset (e.g., seed column names).

task_traces
list[data_designer.engine.dataset_builders.utils.task_model.TaskTrace] | NoneDefaults to None

Optional list of TaskTrace objects from the async scheduler. Resume note: only contains traces for the current invocation; traces from earlier create() calls that this run resumed are not retained.

1load_analysis() -> data_designer.config.analysis.dataset_profiler.DatasetProfilerResults

Load the profiling analysis results for the generated dataset.

Returns:

data_designer.config.analysis.dataset_profiler.DatasetProfilerResults

DatasetProfilerResults containing statistical analysis and quality metrics for configured columns in the generated dataset.

1load_dataset() -> pandas.DataFrame

Load the generated dataset as a pandas DataFrame.

Returns:

pandas.DataFrame

A pandas DataFrame containing the full generated dataset.

1count_records() -> int

Return the total number of records in the generated dataset.

Counts rows by reading Parquet file metadata only — no data pages are loaded, so memory usage is constant regardless of dataset size.

Returns:

int

Total row count across all batch parquet files.

1load_processor_dataset(processor_name: str) -> pandas.DataFrame

Load the dataset generated by a processor.

This only works for processors that write their artifacts in Parquet format.

Parameters:

processor_name
str

The name of the processor to load the dataset from.

Returns:

pandas.DataFrame

A pandas DataFrame containing the dataset generated by the processor.

1get_path_to_processor_artifacts(processor_name: str) -> pathlib.Path

Get the path to the artifacts generated by a processor.

Parameters:

processor_name
str

The name of the processor to load the artifact from.

Returns:

pathlib.Path

The path to the artifacts.

1export(
2 path: pathlib.Path | str,
3 *,
4 format: data_designer.interface.results.ExportFormat | None = None
5) -> pathlib.Path

Export the generated dataset to a single file by streaming batch files.

The output format is inferred from the file extension when format is omitted. Pass format explicitly to override the extension (e.g. write a .txt file as JSONL).

Unlike :meth:load_dataset, this method never materialises the full dataset in memory — it reads batch parquet files one at a time and appends each to the output file, keeping peak memory proportional to a single batch.

Parameters:

path
pathlib.Path | str

Output file path. The exact path is used as-is; the extension is not rewritten.

format
data_designer.interface.results.ExportFormat | NoneDefaults to None

Output format. One of 'jsonl', 'csv', or 'parquet'. When omitted, the format is inferred from the file extension.

Returns:

pathlib.Path

Path to the written file.

Raises:

InvalidFileFormatError

If the format cannot be determined or is not one of the supported values.

ArtifactStorageError

If no batch parquet files are found.

Example:

1>>> results = data_designer.create(config, num_records=1000)
2>>> results.export("output.jsonl")
3PosixPath('output.jsonl')
4>>> results.export("output.csv")
5PosixPath('output.csv')
6>>> results.export("output.txt", format="jsonl")
7PosixPath('output.txt')
1push_to_hub(
2 repo_id: str,
3 description: str,
4 *,
5 token: str | None = None,
6 private: bool = False,
7 tags: list[str] | None = None
8) -> str

Push dataset to HuggingFace Hub.

Uploads all artifacts including:

  • Main parquet batch files (data subset)
  • Processor output batch files ({processor_name} subsets)
  • Configuration (builder_config.json)
  • Metadata (metadata.json)
  • Auto-generated dataset card (README.md)

Parameters:

repo_id
str

HuggingFace repo ID (e.g., “username/my-dataset”)

description
str

Custom description text for the dataset card. Appears after the title.

token
str | NoneDefaults to None

HuggingFace API token. If None, the token is automatically resolved from HF_TOKEN environment variable or cached credentials from hf auth login.

private
boolDefaults to False

Create private repo

tags
list[str] | NoneDefaults to None

Additional custom tags for the dataset.

Returns:

str

URL to the uploaded dataset

Example:

1>>> results = data_designer.create(config, num_records=1000)
2>>> description = "This dataset contains synthetic conversations for training chatbots."
3>>> results.push_to_hub("username/my-synthetic-dataset", description, tags=["chatbot", "conversation"])
4'https://huggingface.co/datasets/username/my-synthetic-dataset'
1data_designer.interface.results._export_jsonl(
2 batch_files: list[pathlib.Path],
3 output: pathlib.Path
4) -> None

Write batch_files to output as JSONL, one record per line.

Each batch is appended in turn so peak memory stays proportional to one batch.

1data_designer.interface.results._export_csv(
2 batch_files: list[pathlib.Path],
3 output: pathlib.Path
4) -> None

Write batch_files to output as CSV with a single header row.

1data_designer.interface.results._export_parquet(
2 batch_files: list[pathlib.Path],
3 output: pathlib.Path
4) -> None

Write batch_files to output as a single Parquet file.

Schemas are unified across batches before writing so that columns with minor type drift (e.g. int64 vs float64 across batches) are cast to a consistent schema rather than causing a write error.

Raises:

InvalidFileFormatError

If batch schemas have incompatible column names or types that cannot be unified or cast.