For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • models
      • mcp
      • column_configs
      • config_builder
      • data_designer_config
      • run_config
      • sampler_params
      • validator_params
      • seeds
      • processors
      • analysis
      • Config API
        • Analysis
        • Base
        • Column Configs
        • Column Types
        • Config Builder
        • Custom Column
        • Data Designer Config
        • Dataset Metadata
        • Default Model Settings
        • Errors
        • Exportable Config
        • Fingerprint
        • Interface
        • Mcp
        • Models
        • Preview Results
        • Processor Types
        • Processors
        • Run Config
        • Sampler Constraints
        • Sampler Params
        • Seed
        • Seed Source
        • Seed Source Dataframe
        • Seed Source Types
        • Testing
        • Utils
        • Validator Params
        • Version
  • Dev Notes
    • Overview
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Classes
  • Data
  • API
Code ReferenceConfigConfig API

data_designer.config.base

||View as Markdown|
Previous

Reporting

Next

Column Configs

Module Contents

Classes

NameDescription
ConfigBase!!! abstract “Usage Documentation” Models
SkipConfigExpression gate for conditional column generation.
SingleColumnConfigAbstract base class for all single-column configuration types.
ProcessorConfigAbstract base class for all processor configuration types.

Data

_VALIDATION_ENV

API

1_VALIDATION_ENV = ImmutableSandboxedEnvironment(...)
1class data_designer.config.base.ConfigBase(
2 /,
3 **data: typing.Any
4)

Bases: pydantic.BaseModel

1model_config = ConfigDict(...)
1class data_designer.config.base.SkipConfig(
2 /,
3 **data: typing.Any
4)

Bases: data_designer.config.base.ConfigBase

Expression gate for conditional column generation.

Attach to a SingleColumnConfig via skip=SkipConfig(...) to gate generation on a Jinja2 expression. Controls when to skip; propagation of upstream skips is controlled separately by propagate_skip on SingleColumnConfig.

Parameters:

when

Jinja2 expression (including \{\{ \}\} delimiters); when truthy, skip generation for this row.

value

Value to write for skipped cells. Defaults to None (becomes NaN/pd.NA in the DataFrame).

Attributes:

when

Jinja2 expression (including \{\{ \}\} delimiters); when truthy, skip generation for this row.

value

Value to write for skipped cells. Defaults to None (becomes NaN/pd.NA in the DataFrame).

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1when: str = Field(...)
1value: bool | int | float | str | None = Field(...)
1_validate_when_syntax(v: str) -> str
1columns() -> list[str]

Column names referenced in the when expression.

Parsed once from the Jinja2 AST and cached. Used by the DAG builder to add dependency edges and by the execution graph to store metadata.

1class data_designer.config.base.SingleColumnConfig(
2 /,
3 **data: typing.Any
4)

Bases: data_designer.config.base.ConfigBase, abc.ABC

Abstract base class for all single-column configuration types.

This class serves as the foundation for all column configurations in DataDesigner, defining shared fields and properties across all column type.

Parameters:

name

Unique name of the column to be generated.

drop

If True, the column will be generated but removed from the final dataset. Useful for intermediate columns that are dependencies for other columns.

allow_resize

If True, the generator may emit a different number of rows than it received (1:N or N:1). Explicit skip gates are invalid on resize columns, and upstream skip propagation is not applied to them.

column_type

Discriminator field that identifies the specific column type. Subclasses must override this field to specify the column type with a Literal value.

skip

Optional expression gate for conditional generation.

propagate_skip

If True (default), this column auto-skips when any of its required_columns was skipped. Independent of skip.

Attributes:

name

Unique name of the column to be generated.

drop

If True, the column will be generated but removed from the final dataset. Useful for intermediate columns that are dependencies for other columns.

allow_resize

If True, the generator may emit a different number of rows than it received (1:N or N:1). Explicit skip gates are invalid on resize columns, and upstream skip propagation is not applied to them.

column_type

Discriminator field that identifies the specific column type. Subclasses must override this field to specify the column type with a Literal value.

skip

Optional expression gate for conditional generation.

propagate_skip

If True (default), this column auto-skips when any of its required_columns was skipped. Independent of skip.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1name: str
1drop: bool = False
1allow_resize: bool = False
1column_type: str
1skip: data_designer.config.base.SkipConfig | None
1propagate_skip: bool = Field(...)
1_validate_skip_scope() -> typing_extensions.Self
1get_column_emoji() -> str
1required_columns: list[str]

Decorators: @abstractmethod

Returns a list of column names that must exist before this column can be generated.

Returns:

Any

List of column names that this column depends on. Empty list indicates no dependencies. Override in subclasses to specify dependencies.

1side_effect_columns: list[str]

Decorators: @abstractmethod

Returns a list of additional columns that this column will create as a side effect.

Some column types generate additional metadata or auxiliary columns alongside the primary column (e.g., reasoning traces for LLM columns).

Returns:

Any

List of column names that this column will create as a side effect. Empty list indicates no side effect columns. Override in subclasses to specify side effects.

1get_model_aliases() -> list[str]

Return every model alias this column depends on.

The startup model health check uses this to decide which model endpoints to ping. The default implementation returns the column’s primary model_alias (if the attribute is present), which covers the built-in LLM, embedding, and image columns.

Override this method on configs that depend on more than one model — for example, a plugin config with both a model_alias and a judge_model_alias should return both so a typo or unreachable endpoint on the secondary alias surfaces at startup rather than at first generation.

An empty-string model_alias is forwarded to the health check so that the registry’s “no model config with alias ” found” error is raised eagerly at startup instead of at first generation; only a truly missing attribute is treated as “no model endpoints”.

Returns:

list[str]

List of model aliases this column depends on. Empty list indicates the column does not call any model endpoints.

1class data_designer.config.base.ProcessorConfig(
2 /,
3 **data: typing.Any
4)

Bases: data_designer.config.base.ConfigBase, abc.ABC

Abstract base class for all processor configuration types.

Processors are transformations that run at different stages of the generation pipeline. They can modify, reshape, or augment the dataset.

Parameters:

name

Unique name of the processor, used to identify the processor in results and to name output artifacts on disk.

processor_type

Discriminator field that identifies the specific processor type. Subclasses must override this field with a Literal value.

Attributes:

name

Unique name of the processor, used to identify the processor in results and to name output artifacts on disk.

processor_type

Discriminator field that identifies the specific processor type. Subclasses must override this field with a Literal value.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1name: str = Field(...)
1processor_type: str