data_designer.config.base
data_designer.config.base
data_designer.config.base
Bases: pydantic.BaseModel
Bases: data_designer.config.base.ConfigBase
Expression gate for conditional column generation.
Attach to a SingleColumnConfig via skip=SkipConfig(...) to gate
generation on a Jinja2 expression. Controls when to skip; propagation
of upstream skips is controlled separately by propagate_skip on
SingleColumnConfig.
Parameters:
Jinja2 expression (including \{\{ \}\} delimiters); when truthy,
skip generation for this row.
Value to write for skipped cells. Defaults to None
(becomes NaN/pd.NA in the DataFrame).
Attributes:
Jinja2 expression (including \{\{ \}\} delimiters); when truthy,
skip generation for this row.
Value to write for skipped cells. Defaults to None
(becomes NaN/pd.NA in the DataFrame).
Initialization:
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be
validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Column names referenced in the when expression.
Parsed once from the Jinja2 AST and cached. Used by the DAG builder to add dependency edges and by the execution graph to store metadata.
Bases: data_designer.config.base.ConfigBase, abc.ABC
Abstract base class for all single-column configuration types.
This class serves as the foundation for all column configurations in DataDesigner, defining shared fields and properties across all column type.
Parameters:
Unique name of the column to be generated.
If True, the column will be generated but removed from the final dataset. Useful for intermediate columns that are dependencies for other columns.
If True, the generator may emit a different number of rows than
it received (1:N or N:1). Explicit skip gates are invalid on resize
columns, and upstream skip propagation is not applied to them.
Discriminator field that identifies the specific column type.
Subclasses must override this field to specify the column type with a Literal value.
Optional expression gate for conditional generation.
If True (default), this column auto-skips when any of its
required_columns was skipped. Independent of skip.
Attributes:
Unique name of the column to be generated.
If True, the column will be generated but removed from the final dataset. Useful for intermediate columns that are dependencies for other columns.
If True, the generator may emit a different number of rows than
it received (1:N or N:1). Explicit skip gates are invalid on resize
columns, and upstream skip propagation is not applied to them.
Discriminator field that identifies the specific column type.
Subclasses must override this field to specify the column type with a Literal value.
Optional expression gate for conditional generation.
If True (default), this column auto-skips when any of its
required_columns was skipped. Independent of skip.
Initialization:
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be
validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Decorators: @abstractmethod
Returns a list of column names that must exist before this column can be generated.
Returns:
Any
List of column names that this column depends on. Empty list indicates no dependencies. Override in subclasses to specify dependencies.
Decorators: @abstractmethod
Returns a list of additional columns that this column will create as a side effect.
Some column types generate additional metadata or auxiliary columns alongside the primary column (e.g., reasoning traces for LLM columns).
Returns:
Any
List of column names that this column will create as a side effect. Empty list indicates no side effect columns. Override in subclasses to specify side effects.
Return every model alias this column depends on.
The startup model health check uses this to decide which model endpoints to ping.
The default implementation returns the column’s primary model_alias (if the
attribute is present), which covers the built-in LLM, embedding, and image columns.
Override this method on configs that depend on more than one model — for example,
a plugin config with both a model_alias and a judge_model_alias should return
both so a typo or unreachable endpoint on the secondary alias surfaces at startup
rather than at first generation.
An empty-string model_alias is forwarded to the health check so that the
registry’s “no model config with alias ” found” error is raised eagerly at startup
instead of at first generation; only a truly missing attribute is treated as “no
model endpoints”.
Returns:
list[str]
List of model aliases this column depends on. Empty list indicates the column does not call any model endpoints.
Bases: data_designer.config.base.ConfigBase, abc.ABC
Abstract base class for all processor configuration types.
Processors are transformations that run at different stages of the generation pipeline. They can modify, reshape, or augment the dataset.
Parameters:
Unique name of the processor, used to identify the processor in results and to name output artifacts on disk.
Discriminator field that identifies the specific processor type.
Subclasses must override this field with a Literal value.
Attributes:
Unique name of the processor, used to identify the processor in results and to name output artifacts on disk.
Discriminator field that identifies the specific processor type.
Subclasses must override this field with a Literal value.
Initialization:
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be
validated to form a valid model.
self is explicitly positional-only to allow self as a field name.