> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/datadesigner/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/datadesigner/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/datadesigner/_mcp/server.

# data\_designer.interface.data\_designer

## Module Contents

### Classes

| Name                                                               | Description                                              |
| ------------------------------------------------------------------ | -------------------------------------------------------- |
| [`DataDesigner`](#data_designerinterfacedata_designerdatadesigner) | Main interface for creating datasets with Data Designer. |

### Functions

| Name                                                                                                 | Description                                                    |
| ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- |
| [`_initialize_interface_runtime`](#data_designerinterfacedata_designer_initialize_interface_runtime) | Run one-time runtime initialization for the interface package. |

### Data

[`logger`](#data_designerinterfacedata_designerlogger)
[`_interface_runtime_initialized`](#data_designerinterfacedata_designer_interface_runtime_initialized)
[`DEFAULT_SECRET_RESOLVER`](#data_designerinterfacedata_designerdefault_secret_resolver)
[`DEFAULT_SEED_READERS`](#data_designerinterfacedata_designerdefault_seed_readers)

### API

```python
logger = getLogger(...)
```

```python
_interface_runtime_initialized = False
```

```python
data_designer.interface.data_designer._initialize_interface_runtime() -> None
```

Run one-time runtime initialization for the interface package.

```python
DEFAULT_SECRET_RESOLVER = CompositeResolver(...)
```

```python
DEFAULT_SEED_READERS
```

```python
class data_designer.interface.data_designer.DataDesigner(
    artifact_path: pathlib.Path | str | None = None,
    *,
    model_providers: list[data_designer.config.models.ModelProvider] | None = None,
    secret_resolver: data_designer.engine.secret_resolver.SecretResolver | None = None,
    seed_readers: list[data_designer.engine.resources.seed_reader.SeedReader] | None = None,
    managed_assets_path: pathlib.Path | str | None = None,
    person_reader: data_designer.engine.resources.person_reader.PersonReader | None = None,
    mcp_providers: list[data_designer.config.mcp.MCPProviderT] | None = None
)
```

**Bases**: `data_designer.config.interface.DataDesignerInterface[data_designer.interface.results.DatasetCreationResults]`

Main interface for creating datasets with Data Designer.

This class provides the primary interface for building synthetic datasets using
Data Designer configurations. It manages model providers, artifact storage, and
orchestrates the dataset creation and profiling processes.

**Parameters:**

Path where generated artifacts will be stored. If not
provided, artifacts are stored in an `artifacts` directory under the
current working directory.

Optional list of model providers for LLM generation. If None,
uses default providers.

Resolver for handling secrets and credentials. If None,
uses the default composite resolver, which checks environment variables
and plaintext values.

Optional list of seed readers. If None, uses default readers.

Path to the managed assets directory. This is used to point
to the location of managed datasets and other assets used during dataset generation.
If not provided, will check for an environment variable called DATA\_DESIGNER\_MANAGED\_ASSETS\_PATH.
If the environment variable is not set, will use the default managed assets directory, which
is defined in `data_designer.config.utils.constants`.

Optional custom reader for person datasets.
If provided, this reader will be used instead of the default local reader.
This allows clients to customize how managed datasets are accessed (e.g.,
using custom fsspec clients for S3 or other remote storage).

Optional list of MCP provider configurations to enable tool-calling for
LLM generation columns. Supports both MCPProvider (remote SSE or Streamable HTTP) and
LocalStdioMCPProvider (local subprocess).

```python
info: data_designer.config.utils.info.InterfaceInfo
```

Get information about the Data Designer interface.

**Returns:**

`Any`

InterfaceInfo object with information about the Data Designer interface.

```python
list_mcp_tool_names(
    mcp_provider_name: str,
    *,
    timeout_sec: float = 10.0
) -> list[str]
```

Connect to a configured MCP provider and return the names of its available tools.

**Parameters:**

The `name` field of an MCP provider passed to the constructor.

Timeout in seconds for the MCP handshake. Defaults to 10.

**Returns:**

`list[str]`

A list of tool name strings exposed by the MCP server.

**Raises:**

If no provider with the given name was configured.

```python
create(
    config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder,
    *,
    num_records: int = DEFAULT_NUM_RECORDS,
    dataset_name: str = 'dataset',
    resume: data_designer.engine.storage.artifact_storage.ResumeMode = ResumeMode.NEVER
) -> data_designer.interface.results.DatasetCreationResults
```

Create dataset and save results to the local artifact storage.

This method orchestrates the full dataset creation pipeline including building
the dataset according to the configuration, profiling the generated data, and
storing artifacts.

**Parameters:**

The DataDesignerConfigBuilder containing the dataset
configuration (columns, constraints, seed data, etc.).

Number of records to generate.

Name of the dataset. This name will be used as the dataset
folder name in the artifact path directory. If a non-empty directory with the
same name already exists, dataset will be saved to a new directory with
a datetime stamp. For example, if the dataset name is "awesome\_dataset" and a directory
with the same name already exists, the dataset will be saved to a new directory
with the name "awesome\_dataset\_2025-01-01\_12-00-00".

Controls how interrupted runs are handled.

* `ResumeMode.NEVER` (default): always start a fresh generation run.
* `ResumeMode.ALWAYS`: resume from the last completed batch (sync) or row group
  (async). `buffer_size` must match the original run. `num_records` may be
  equal to or greater than what was already generated (you can extend the dataset);
  `num_records` less than actual records so far raises `DatasetGenerationError`.
  If no checkpoint exists yet (interrupted before the first batch finished), silently
  restarts from the beginning. Raises if the stored config is incompatible.
* `ResumeMode.IF_POSSIBLE`: like `ALWAYS` when the current config fingerprint
  matches the stored config; otherwise starts a fresh run without raising an error.

In all resume modes, in-flight partial results from the interrupted run are
discarded before generation continues.

**Returns:**

`data_designer.interface.results.DatasetCreationResults`

DatasetCreationResults object with methods for loading the generated dataset,
analysis results, and displaying sample records for inspection.

**Raises:**

If an error occurs during dataset generation.

If an error occurs during dataset profiling.

```python
preview(
    config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder,
    *,
    num_records: int = DEFAULT_NUM_RECORDS
) -> data_designer.config.preview_results.PreviewResults
```

Generate preview dataset for fast iteration on your Data Designer configuration.

All preview results are stored in memory. Once you are satisfied with the preview,
use the `create` method to generate data at a larger scale and save results to disk.

**Parameters:**

The DataDesignerConfigBuilder containing the dataset
configuration (columns, constraints, seed data, etc.).

Number of records to generate.

**Returns:**

`data_designer.config.preview_results.PreviewResults`

PreviewResults object with methods for inspecting the results.

**Raises:**

If an error occurs during preview dataset generation.

If preview terminated via the early-shutdown gate
with zero records produced. Subclass of `DataDesignerGenerationError`.

If an error occurs during preview dataset profiling.

```python
_log_jinja_rendering_engine_mode() -> None
```

```python
validate(config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder) -> None
```

Validate the Data Designer configuration as defined by the DataDesignerConfigBuilder
with the configured engine components (SecretResolver, SeedReaders, etc.).

**Parameters:**

The DataDesignerConfigBuilder containing the dataset
configuration (columns, constraints, seed data, etc.).

**Returns:**

`None`

None if the configuration is valid.

**Raises:**

If the configuration is invalid.

```python
get_default_model_configs() -> list[data_designer.config.models.ModelConfig]
```

Get the default model configurations.

**Returns:**

`list[data_designer.config.models.ModelConfig]`

List of default model configurations.

```python
get_default_model_providers() -> list[data_designer.config.models.ModelProvider]
```

Get the default model providers.

**Returns:**

`list[data_designer.config.models.ModelProvider]`

List of default model providers.

```python
secret_resolver: data_designer.engine.secret_resolver.SecretResolver
```

Get the secret resolver used by this DataDesigner instance.

**Returns:**

`Any`

The SecretResolver instance handling credentials and secrets.

```python
model_provider_registry: data_designer.engine.model_provider.ModelProviderRegistry
```

Get the resolved model provider registry.

**Returns:**

`Any`

The ModelProviderRegistry containing the providers and default
resolved at construction time. The default is taken from the
first user-supplied provider when `model_providers` was passed
to the constructor; otherwise from the YAML's `default:` key
when set, falling back to the first provider in the YAML list.

```python
run_config: data_designer.config.run_config.RunConfig
```

Get the runtime configuration applied to dataset generation.

**Returns:**

`Any`

The active RunConfig instance. Note that `RunConfig` normalizes
some fields on construction (e.g., `shutdown_error_rate` becomes
`1.0` when `disable_early_shutdown=True`), so the returned
object may not exactly equal the one originally passed to
`set_run_config`.

```python
set_run_config(run_config: data_designer.config.run_config.RunConfig) -> None
```

Set the runtime configuration for dataset generation.

**Parameters:**

A RunConfig instance containing runtime settings such as
early shutdown behavior, batch sizing via `buffer_size`, and non-inference worker
concurrency via `non_inference_max_parallel_workers`.

**Notes:**

When `disable_early_shutdown=True`, DataDesigner will never terminate generation early
due to error-rate thresholds. Errors are still tracked for reporting.

```python
get_models(model_aliases: list[str]) -> dict[str, data_designer.engine.models.facade.ModelFacade]
```

Get a dict of ModelFacade instances for custom column development.

Use this to experiment with custom column generator functions outside of
the full pipeline. The returned dict matches the `models` argument passed
to 3-arg custom column functions.

**Parameters:**

List of model aliases to include in the dict.

**Returns:**

`dict[str, data_designer.engine.models.facade.ModelFacade]`

Dict mapping alias to ModelFacade instance.

```python
_resolve_model_providers(model_providers: list[data_designer.config.models.ModelProvider] | None) -> list[data_designer.config.models.ModelProvider]
```

```python
_create_dataset_builder(
    data_designer_config: data_designer.config.data_designer_config.DataDesignerConfig,
    resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
) -> data_designer.engine.dataset_builders.dataset_builder.DatasetBuilder
```

```python
_create_dataset_profiler(
    config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder,
    resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
) -> data_designer.engine.analysis.dataset_profiler.DataDesignerDatasetProfiler
```

```python
_create_resource_provider(
    dataset_name: str,
    config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder,
    *,
    resume: data_designer.engine.storage.artifact_storage.ResumeMode = ResumeMode.NEVER
) -> data_designer.engine.resources.resource_provider.ResourceProvider
```

```python
_resolve_client_concurrency_mode(config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder) -> data_designer.engine.models.clients.adapters.http_model_client.ClientConcurrencyMode
```

Pick the model-client mode that matches the engine the run will use.

The async engine is the default, but `allow_resize=True` columns force
a sync-engine fallback (see `DatasetBuilder._resolve_async_compatibility`).
Without aligning the client mode here, those runs would create async-only
clients and then call sync methods on them — raising `SyncClientUnavailableError`
from inside the sync engine. Match the client mode to the actual engine
choice so the fallback path is functional.

```python
_get_interface_info(model_providers: list[data_designer.config.models.ModelProvider]) -> data_designer.config.utils.info.InterfaceInfo
```