> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/datadesigner/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/datadesigner/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/datadesigner/_mcp/server.

# data\_designer.interface.results

## Module Contents

### Classes

| Name                                                                             | Description                                                 |
| -------------------------------------------------------------------------------- | ----------------------------------------------------------- |
| [`DatasetCreationResults`](#data_designerinterfaceresultsdatasetcreationresults) | Results container for a Data Designer dataset creation run. |

### Functions

| Name                                                               | Description                                                       |
| ------------------------------------------------------------------ | ----------------------------------------------------------------- |
| [`_export_jsonl`](#data_designerinterfaceresults_export_jsonl)     | Write *batch\_files* to *output* as JSONL, one record per line.   |
| [`_export_csv`](#data_designerinterfaceresults_export_csv)         | Write *batch\_files* to *output* as CSV with a single header row. |
| [`_export_parquet`](#data_designerinterfaceresults_export_parquet) | Write *batch\_files* to *output* as a single Parquet file.        |

### Data

[`ExportFormat`](#data_designerinterfaceresultsexportformat)
[`SUPPORTED_EXPORT_FORMATS`](#data_designerinterfaceresultssupported_export_formats)

### API

```python
ExportFormat
```

```python
class data_designer.interface.results.DatasetCreationResults(
    *,
    artifact_storage: data_designer.engine.storage.artifact_storage.ArtifactStorage,
    analysis: data_designer.config.analysis.dataset_profiler.DatasetProfilerResults,
    config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder,
    dataset_metadata: data_designer.config.dataset_metadata.DatasetMetadata,
    task_traces: list[data_designer.engine.dataset_builders.utils.task_model.TaskTrace] | None = None
)
```

**Bases**: `data_designer.config.utils.visualization.WithRecordSamplerMixin`

Results container for a Data Designer dataset creation run.

This class provides access to the generated dataset, profiling analysis, and
visualization utilities. It is returned by the DataDesigner.create() method
and implements ResultsProtocol of the DataDesigner interface.

Resume scope: methods that read from the artifact directory (`load_dataset`,
`count_records`, `load_analysis`, `export`, `push_to_hub`) reflect the
full dataset on disk, including rows produced by earlier `create()` calls
that the current invocation resumed. Per-run observability — `task_traces`
and any model-usage / telemetry side effects emitted during the call — is
scoped to the current invocation only, because the original run's in-memory
state is not persisted across process boundaries.

**Initialization:**

Creates a new instance with results based on a dataset creation run.

**Parameters:**

Storage manager for accessing generated artifacts.

Profiling results for the generated dataset.

Configuration builder used to create the dataset.

Metadata about the generated dataset (e.g., seed column names).

Optional list of TaskTrace objects from the async scheduler.
Resume note: only contains traces for the current invocation; traces
from earlier `create()` calls that this run resumed are not
retained.

```python
load_analysis() -> data_designer.config.analysis.dataset_profiler.DatasetProfilerResults
```

Load the profiling analysis results for the generated dataset.

**Returns:**

`data_designer.config.analysis.dataset_profiler.DatasetProfilerResults`

DatasetProfilerResults containing statistical analysis and quality metrics
for configured columns in the generated dataset.

```python
load_dataset() -> pandas.DataFrame
```

Load the generated dataset as a pandas DataFrame.

**Returns:**

`pandas.DataFrame`

A pandas DataFrame containing the full generated dataset.

```python
count_records() -> int
```

Return the total number of records in the generated dataset.

Counts rows by reading Parquet file metadata only — no data pages are
loaded, so memory usage is constant regardless of dataset size.

**Returns:**

`int`

Total row count across all batch parquet files.

```python
load_processor_dataset(processor_name: str) -> pandas.DataFrame
```

Load the dataset generated by a processor.

This only works for processors that write their artifacts in Parquet format.

**Parameters:**

The name of the processor to load the dataset from.

**Returns:**

`pandas.DataFrame`

A pandas DataFrame containing the dataset generated by the processor.

```python
get_path_to_processor_artifacts(processor_name: str) -> pathlib.Path
```

Get the path to the artifacts generated by a processor.

**Parameters:**

The name of the processor to load the artifact from.

**Returns:**

`pathlib.Path`

The path to the artifacts.

```python
export(
    path: pathlib.Path | str,
    *,
    format: data_designer.interface.results.ExportFormat | None = None
) -> pathlib.Path
```

Export the generated dataset to a single file by streaming batch files.

The output format is inferred from the file extension when *format* is
omitted.  Pass *format* explicitly to override the extension (e.g. write a
`.txt` file as JSONL).

Unlike :meth:`load_dataset`, this method never materialises the full dataset
in memory — it reads batch parquet files one at a time and appends each to
the output file, keeping peak memory proportional to a single batch.

**Parameters:**

Output file path. The exact path is used as-is; the extension is
not rewritten.

Output format. One of `'jsonl'`, `'csv'`, or `'parquet'`.
When omitted, the format is inferred from the file extension.

**Returns:**

`pathlib.Path`

Path to the written file.

**Raises:**

If the format cannot be determined or is not
one of the supported values.

If no batch parquet files are found.

**Example:**

```python
>>> results = data_designer.create(config, num_records=1000)
>>> results.export("output.jsonl")
PosixPath('output.jsonl')
>>> results.export("output.csv")
PosixPath('output.csv')
>>> results.export("output.txt", format="jsonl")
PosixPath('output.txt')
```

```python
push_to_hub(
    repo_id: str,
    description: str,
    *,
    token: str | None = None,
    private: bool = False,
    tags: list[str] | None = None
) -> str
```

Push dataset to HuggingFace Hub.

Uploads all artifacts including:

* Main parquet batch files (data subset)
* Processor output batch files (\{processor\_name} subsets)
* Configuration (builder\_config.json)
* Metadata (metadata.json)
* Auto-generated dataset card (README.md)

**Parameters:**

HuggingFace repo ID (e.g., "username/my-dataset")

Custom description text for the dataset card.
Appears after the title.

HuggingFace API token. If None, the token is automatically
resolved from HF\_TOKEN environment variable or cached credentials
from `hf auth login`.

Create private repo

Additional custom tags for the dataset.

**Returns:**

`str`

URL to the uploaded dataset

**Example:**

```python
>>> results = data_designer.create(config, num_records=1000)
>>> description = "This dataset contains synthetic conversations for training chatbots."
>>> results.push_to_hub("username/my-synthetic-dataset", description, tags=["chatbot", "conversation"])
'https://huggingface.co/datasets/username/my-synthetic-dataset'
```

```python
data_designer.interface.results._export_jsonl(
    batch_files: list[pathlib.Path],
    output: pathlib.Path
) -> None
```

Write *batch\_files* to *output* as JSONL, one record per line.

Each batch is appended in turn so peak memory stays proportional to one batch.

```python
data_designer.interface.results._export_csv(
    batch_files: list[pathlib.Path],
    output: pathlib.Path
) -> None
```

Write *batch\_files* to *output* as CSV with a single header row.

```python
data_designer.interface.results._export_parquet(
    batch_files: list[pathlib.Path],
    output: pathlib.Path
) -> None
```

Write *batch\_files* to *output* as a single Parquet file.

Schemas are unified across batches before writing so that columns with minor
type drift (e.g. `int64` vs `float64` across batches) are cast to a
consistent schema rather than causing a write error.

**Raises:**

If batch schemas have incompatible column names or
types that cannot be unified or cast.