data_designer.config.processors

Module Contents

Classes

Name	Description
`ProcessorType`	Enumeration of available processor types.
`DropColumnsProcessorConfig`	Drop columns from the output dataset (prefer `drop=True` in the column config).
`SchemaTransformProcessorConfig`	Configuration for transforming the dataset schema using Jinja2 templates.

Functions

Name	Description
`get_processor_config_from_kwargs`	Create a processor configuration from a processor type and keyword arguments.

API

1 class data_designer.config.processors.ProcessorType

Bases: str, enum.Enum

Enumeration of available processor types.

Attributes:

DROP_COLUMNS

Processor that removes specified columns from the output dataset.

SCHEMA_TRANSFORM

Processor that creates a new dataset with a transformed schema using Jinja2 templates.

Initialization:

Initialize self. See help(type(self)) for accurate signature.

1 DROP_COLUMNS = drop_columns

1 SCHEMA_TRANSFORM = schema_transform

1 data_designer.config.processors.get_processor_config_from_kwargs(
2     processor_type: data_designer.config.processors.ProcessorType,
3     **kwargs: typing.Any
4 ) -> data_designer.config.base.ProcessorConfig

Create a processor configuration from a processor type and keyword arguments.

Parameters:

processor_type

data_designer.config.processors.ProcessorType

The type of processor to create.

**kwargs

Additional keyword arguments passed to the processor constructor.

Returns:

data_designer.config.base.ProcessorConfig

A processor configuration object of the specified type.

1 class data_designer.config.processors.DropColumnsProcessorConfig(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ProcessorConfig

Drop columns from the output dataset (prefer drop=True in the column config).

This processor removes specified columns from the generated dataset. The dropped columns are saved separately in the dropped-columns-parquet-files directory for reference. When this processor is added via the config builder, the corresponding column configs are automatically marked with drop = True.

Parameters:

column_names

List of column names to remove from the output dataset.

Inherited Attributes: name (required): Name of the processor. Attributes:

column_names

`required`

List of column names to remove from the output dataset.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_names: list[str] = Field(...)

1 processor_type: typing.Literal[data_designer.config.processors.ProcessorType]

1 class data_designer.config.processors.SchemaTransformProcessorConfig(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ProcessorConfig

Configuration for transforming the dataset schema using Jinja2 templates.

This processor creates a new dataset with a transformed schema. Each key in the template becomes a column in the output, and values are Jinja2 templates that can reference any column in the batch. The transformed dataset is written to a processors-files/{processor_name}/ directory alongside the main dataset.

Parameters:

template

Dictionary defining the output schema. Keys are new column names, values are Jinja2 templates (strings, lists, or nested structures). Must be JSON-serializable.

Inherited Attributes: name (required): Name of the processor. Attributes:

template

`required`

Dictionary defining the output schema. Keys are new column names, values are Jinja2 templates (strings, lists, or nested structures). Must be JSON-serializable.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 template: dict[str, typing.Any] = Field(...)

1 processor_type: typing.Literal[data_designer.config.processors.ProcessorType]

1 validate_template(v: dict[str, typing.Any]) -> dict[str, typing.Any]

1	data_designer.config.processors.get_processor_config_from_kwargs(
2	processor_type: data_designer.config.processors.ProcessorType,
3	**kwargs: typing.Any
4	) -> data_designer.config.base.ProcessorConfig

1	class data_designer.config.processors.DropColumnsProcessorConfig(
2	/,
3	**data: typing.Any
4	)

1	class data_designer.config.processors.SchemaTransformProcessorConfig(
2	/,
3	**data: typing.Any
4	)