data_designer.config.processors
data_designer.config.processors
data_designer.config.processors
Bases: str, enum.Enum
Enumeration of available processor types.
Attributes:
Processor that removes specified columns from the output dataset.
Processor that creates a new dataset with a transformed schema using Jinja2 templates.
Initialization:
Initialize self. See help(type(self)) for accurate signature.
Create a processor configuration from a processor type and keyword arguments.
Parameters:
The type of processor to create.
Additional keyword arguments passed to the processor constructor.
Returns:
data_designer.config.base.ProcessorConfig
A processor configuration object of the specified type.
Bases: data_designer.config.base.ProcessorConfig
Drop columns from the output dataset (prefer drop=True in the column config).
This processor removes specified columns from the generated dataset. The dropped
columns are saved separately in the dropped-columns-parquet-files directory for reference.
When this processor is added via the config builder, the corresponding column
configs are automatically marked with drop = True.
Parameters:
List of column names to remove from the output dataset.
Inherited Attributes: name (required): Name of the processor. Attributes:
List of column names to remove from the output dataset.
Initialization:
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be
validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Bases: data_designer.config.base.ProcessorConfig
Configuration for transforming the dataset schema using Jinja2 templates.
This processor creates a new dataset with a transformed schema. Each key in the
template becomes a column in the output, and values are Jinja2 templates that
can reference any column in the batch. The transformed dataset is written to
a processors-files/{processor_name}/ directory alongside the main dataset.
Parameters:
Dictionary defining the output schema. Keys are new column names, values are Jinja2 templates (strings, lists, or nested structures). Must be JSON-serializable.
Inherited Attributes: name (required): Name of the processor. Attributes:
Dictionary defining the output schema. Keys are new column names, values are Jinja2 templates (strings, lists, or nested structures). Must be JSON-serializable.
Initialization:
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be
validated to form a valid model.
self is explicitly positional-only to allow self as a field name.