For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • data_designer
      • results
      • errors
      • Interface API
        • Data Designer
        • Errors
        • Results
  • Dev Notes
    • Overview
    • Prompt Sensitivity
    • Retriever SDG Toolkit
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Classes
  • Functions
  • Data
  • API
Code ReferenceInterfaceInterface API

data_designer.interface.data_designer

||View as Markdown|
Previous

Interface API

Next

Errors

Module Contents

Classes

NameDescription
DataDesignerMain interface for creating datasets with Data Designer.

Functions

NameDescription
_initialize_interface_runtimeRun one-time runtime initialization for the interface package.

Data

logger _interface_runtime_initialized DEFAULT_SECRET_RESOLVER DEFAULT_SEED_READERS

API

1logger = getLogger(...)
1_interface_runtime_initialized = False
1data_designer.interface.data_designer._initialize_interface_runtime() -> None

Run one-time runtime initialization for the interface package.

1DEFAULT_SECRET_RESOLVER = CompositeResolver(...)
1DEFAULT_SEED_READERS
1class data_designer.interface.data_designer.DataDesigner(
2 artifact_path: pathlib.Path | str | None = None,
3 *,
4 model_providers: list[data_designer.config.models.ModelProvider] | None = None,
5 secret_resolver: data_designer.engine.secret_resolver.SecretResolver | None = None,
6 seed_readers: list[data_designer.engine.resources.seed_reader.SeedReader] | None = None,
7 managed_assets_path: pathlib.Path | str | None = None,
8 person_reader: data_designer.engine.resources.person_reader.PersonReader | None = None,
9 mcp_providers: list[data_designer.config.mcp.MCPProviderT] | None = None
10)

Bases: data_designer.config.interface.DataDesignerInterface[data_designer.interface.results.DatasetCreationResults]

Main interface for creating datasets with Data Designer.

This class provides the primary interface for building synthetic datasets using Data Designer configurations. It manages model providers, artifact storage, and orchestrates the dataset creation and profiling processes.

Parameters:

artifact_path
pathlib.Path | str | NoneDefaults to None

Path where generated artifacts will be stored. If not provided, artifacts are stored in an artifacts directory under the current working directory.

model_providers
list[data_designer.config.models.ModelProvider] | NoneDefaults to None

Optional list of model providers for LLM generation. If None, uses default providers.

secret_resolver
data_designer.engine.secret_resolver.SecretResolver | NoneDefaults to None

Resolver for handling secrets and credentials. If None, uses the default composite resolver, which checks environment variables and plaintext values.

seed_readers
list[data_designer.engine.resources.seed_reader.SeedReader] | NoneDefaults to None

Optional list of seed readers. If None, uses default readers.

managed_assets_path
pathlib.Path | str | NoneDefaults to None

Path to the managed assets directory. This is used to point to the location of managed datasets and other assets used during dataset generation. If not provided, will check for an environment variable called DATA_DESIGNER_MANAGED_ASSETS_PATH. If the environment variable is not set, will use the default managed assets directory, which is defined in data_designer.config.utils.constants.

person_reader
data_designer.engine.resources.person_reader.PersonReader | NoneDefaults to None

Optional custom reader for person datasets. If provided, this reader will be used instead of the default local reader. This allows clients to customize how managed datasets are accessed (e.g., using custom fsspec clients for S3 or other remote storage).

mcp_providers
list[data_designer.config.mcp.MCPProviderT] | NoneDefaults to None

Optional list of MCP provider configurations to enable tool-calling for LLM generation columns. Supports both MCPProvider (remote SSE or Streamable HTTP) and LocalStdioMCPProvider (local subprocess).

1info: data_designer.config.utils.info.InterfaceInfo

Get information about the Data Designer interface.

Returns:

Any

InterfaceInfo object with information about the Data Designer interface.

1list_mcp_tool_names(
2 mcp_provider_name: str,
3 *,
4 timeout_sec: float = 10.0
5) -> list[str]

Connect to a configured MCP provider and return the names of its available tools.

Parameters:

mcp_provider_name
str

The name field of an MCP provider passed to the constructor.

timeout_sec
floatDefaults to 10.0

Timeout in seconds for the MCP handshake. Defaults to 10.

Returns:

list[str]

A list of tool name strings exposed by the MCP server.

Raises:

ValueError

If no provider with the given name was configured.

1create(
2 config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder,
3 *,
4 num_records: int = DEFAULT_NUM_RECORDS,
5 dataset_name: str = 'dataset',
6 resume: data_designer.engine.storage.artifact_storage.ResumeMode = ResumeMode.NEVER
7) -> data_designer.interface.results.DatasetCreationResults

Create dataset and save results to the local artifact storage.

This method orchestrates the full dataset creation pipeline including building the dataset according to the configuration, profiling the generated data, and storing artifacts.

Parameters:

config_builder
data_designer.config.config_builder.DataDesignerConfigBuilder

The DataDesignerConfigBuilder containing the dataset configuration (columns, constraints, seed data, etc.).

num_records
intDefaults to DEFAULT_NUM_RECORDS

Number of records to generate.

dataset_name
strDefaults to 'dataset'

Name of the dataset. This name will be used as the dataset folder name in the artifact path directory. If a non-empty directory with the same name already exists, dataset will be saved to a new directory with a datetime stamp. For example, if the dataset name is “awesome_dataset” and a directory with the same name already exists, the dataset will be saved to a new directory with the name “awesome_dataset_2025-01-01_12-00-00”.

resume
data_designer.engine.storage.artifact_storage.ResumeModeDefaults to ResumeMode.NEVER

Controls how interrupted runs are handled.

  • ResumeMode.NEVER (default): always start a fresh generation run.
  • ResumeMode.ALWAYS: resume from the last completed batch (sync) or row group (async). buffer_size must match the original run. num_records may be equal to or greater than what was already generated (you can extend the dataset); num_records less than actual records so far raises DatasetGenerationError. If no checkpoint exists yet (interrupted before the first batch finished), silently restarts from the beginning. Raises if the stored config is incompatible.
  • ResumeMode.IF_POSSIBLE: like ALWAYS when the current config fingerprint matches the stored config; otherwise starts a fresh run without raising an error.

In all resume modes, in-flight partial results from the interrupted run are discarded before generation continues.

Returns:

data_designer.interface.results.DatasetCreationResults

DatasetCreationResults object with methods for loading the generated dataset, analysis results, and displaying sample records for inspection.

Raises:

DataDesignerGenerationError

If an error occurs during dataset generation.

DataDesignerProfilingError

If an error occurs during dataset profiling.

1preview(
2 config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder,
3 *,
4 num_records: int = DEFAULT_NUM_RECORDS
5) -> data_designer.config.preview_results.PreviewResults

Generate preview dataset for fast iteration on your Data Designer configuration.

All preview results are stored in memory. Once you are satisfied with the preview, use the create method to generate data at a larger scale and save results to disk.

Parameters:

config_builder
data_designer.config.config_builder.DataDesignerConfigBuilder

The DataDesignerConfigBuilder containing the dataset configuration (columns, constraints, seed data, etc.).

num_records
intDefaults to DEFAULT_NUM_RECORDS

Number of records to generate.

Returns:

data_designer.config.preview_results.PreviewResults

PreviewResults object with methods for inspecting the results.

Raises:

DataDesignerGenerationError

If an error occurs during preview dataset generation.

DataDesignerEarlyShutdownError

If preview terminated via the early-shutdown gate with zero records produced. Subclass of DataDesignerGenerationError.

DataDesignerProfilingError

If an error occurs during preview dataset profiling.

1_log_jinja_rendering_engine_mode() -> None
1validate(config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder) -> None

Validate the Data Designer configuration as defined by the DataDesignerConfigBuilder with the configured engine components (SecretResolver, SeedReaders, etc.).

Parameters:

config_builder
data_designer.config.config_builder.DataDesignerConfigBuilder

The DataDesignerConfigBuilder containing the dataset configuration (columns, constraints, seed data, etc.).

Returns:

None

None if the configuration is valid.

Raises:

InvalidConfigError

If the configuration is invalid.

1get_default_model_configs() -> list[data_designer.config.models.ModelConfig]

Get the default model configurations.

Returns:

list[data_designer.config.models.ModelConfig]

List of default model configurations.

1get_default_model_providers() -> list[data_designer.config.models.ModelProvider]

Get the default model providers.

Returns:

list[data_designer.config.models.ModelProvider]

List of default model providers.

1secret_resolver: data_designer.engine.secret_resolver.SecretResolver

Get the secret resolver used by this DataDesigner instance.

Returns:

Any

The SecretResolver instance handling credentials and secrets.

1model_provider_registry: data_designer.engine.model_provider.ModelProviderRegistry

Get the resolved model provider registry.

Returns:

Any

The ModelProviderRegistry containing the providers and default resolved at construction time. The default is taken from the first user-supplied provider when model_providers was passed to the constructor; otherwise from the YAML’s default: key when set, falling back to the first provider in the YAML list.

1run_config: data_designer.config.run_config.RunConfig

Get the runtime configuration applied to dataset generation.

Returns:

Any

The active RunConfig instance. Note that RunConfig normalizes some fields on construction (e.g., shutdown_error_rate becomes 1.0 when disable_early_shutdown=True), so the returned object may not exactly equal the one originally passed to set_run_config.

1set_run_config(run_config: data_designer.config.run_config.RunConfig) -> None

Set the runtime configuration for dataset generation.

Parameters:

run_config
data_designer.config.run_config.RunConfig

A RunConfig instance containing runtime settings such as early shutdown behavior, batch sizing via buffer_size, and non-inference worker concurrency via non_inference_max_parallel_workers.

Notes:

When disable_early_shutdown=True, DataDesigner will never terminate generation early due to error-rate thresholds. Errors are still tracked for reporting.

1get_models(model_aliases: list[str]) -> dict[str, data_designer.engine.models.facade.ModelFacade]

Get a dict of ModelFacade instances for custom column development.

Use this to experiment with custom column generator functions outside of the full pipeline. The returned dict matches the models argument passed to 3-arg custom column functions.

Parameters:

model_aliases
list[str]

List of model aliases to include in the dict.

Returns:

dict[str, data_designer.engine.models.facade.ModelFacade]

Dict mapping alias to ModelFacade instance.

1_resolve_model_providers(model_providers: list[data_designer.config.models.ModelProvider] | None) -> list[data_designer.config.models.ModelProvider]
1_create_dataset_builder(
2 data_designer_config: data_designer.config.data_designer_config.DataDesignerConfig,
3 resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
4) -> data_designer.engine.dataset_builders.dataset_builder.DatasetBuilder
1_create_dataset_profiler(
2 config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder,
3 resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
4) -> data_designer.engine.analysis.dataset_profiler.DataDesignerDatasetProfiler
1_create_resource_provider(
2 dataset_name: str,
3 config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder,
4 *,
5 resume: data_designer.engine.storage.artifact_storage.ResumeMode = ResumeMode.NEVER
6) -> data_designer.engine.resources.resource_provider.ResourceProvider
1_resolve_client_concurrency_mode(config_builder: data_designer.config.config_builder.DataDesignerConfigBuilder) -> data_designer.engine.models.clients.adapters.http_model_client.ClientConcurrencyMode

Pick the model-client mode that matches the engine the run will use.

The async engine is the default, but allow_resize=True columns force a sync-engine fallback (see DatasetBuilder._resolve_async_compatibility). Without aligning the client mode here, those runs would create async-only clients and then call sync methods on them — raising SyncClientUnavailableError from inside the sync engine. Match the client mode to the actual engine choice so the fallback path is functional.

1_get_interface_info(model_providers: list[data_designer.config.models.ModelProvider]) -> data_designer.config.utils.info.InterfaceInfo