Example Plugin: Column Generator
Example Plugin: Column Generator
Example Plugin: Column Generator
Data Designer supports three plugin types: column generators, seed readers, and processors. This page walks through a complete column generator example. For filesystem-backed seed reader plugins, see FileSystemSeedReader Plugins.
A Data Designer plugin is implemented as a Python package with three main components:
We recommend separating these into individual files (config.py, impl.py, plugin.py) within a plugin subdirectory. This keeps the code organized, makes it easy to test each component independently, and guards against circular dependencies — since the config module can be imported without pulling in the engine-level implementation classes, and the plugin object can be discovered without importing either.
In this section, we will build a simple column generator plugin that generates values by multiplying the row index by a user-specified multiplier.
We recommend the following structure for column generator plugins:
The configuration class defines what parameters users can set when using your plugin. For column generator plugins, it must inherit from SingleColumnConfig and include a discriminator field.
Create src/data_designer_index_multiplier/config.py:
Key points:
column_type field must be a Literal type with a string defaultmultiplier)SingleColumnConfig is a Pydantic model, so you can leverage all of Pydantic’s validation featuresget_column_emoji() returns the emoji displayed in logs for this column typerequired_columns lists any columns this generator depends on (empty if none)side_effect_columns lists any additional columns this generator produces beyond the primary column (empty if none)If your plugin can expand or retract the number of rows (1:N or N:1): set allow_resize=True in the config class so the pipeline updates batch bookkeeping correctly. For example:
The default is False; only set it to True when your generate method can return more or fewer rows than it receives.
The implementation class defines the actual business logic of the plugin. For column generator plugins, inherit from ColumnGeneratorFullColumn or ColumnGeneratorCellByCell and implement the generate method.
Create src/data_designer_index_multiplier/impl.py:
Key points:
ColumnGeneratorFullColumn[IndexMultiplierColumnConfig] connects the implementation to its configself.configUnderstanding generation_strategy
The generation_strategy specifies how the column generator will generate data. You choose a strategy by inheriting from the corresponding base class:
ColumnGeneratorFullColumn: Generates the full column (at the batch level) in a single call to generate
generate must take as input a pd.DataFrame with all previous columns and return a pd.DataFrame with the generated column appended.ColumnGeneratorCellByCell: Generates one cell at a time
generate must take as input a dict with key/value pairs for all previous columns and return a dict with an additional key/value for the generated cellmax_parallel_requests parameter on the configurationCreate a Plugin object that makes the plugin discoverable and connects the implementation and config classes.
Create src/data_designer_index_multiplier/plugin.py:
Create a pyproject.toml file to define your package and register the entry point:
Entry Point Registration
Plugins are discovered automatically using Python entry points. It is important to register your plugin as an entry point under the data_designer.plugins group.
The entry point format is:
Install your plugin in editable mode — this is all you need to start using it. No PyPI publishing required:
That’s it. The editable install registers the entry point so Data Designer discovers your plugin automatically. Any changes you make to the plugin source code are picked up immediately without reinstalling.
Once installed, your plugin works just like built-in column types:
Output:
Data Designer provides a testing utility to validate that your plugin is structured correctly. Use assert_valid_plugin to check that your config and implementation classes are properly defined:
This validates that:
ConfigBaseConfigurableTaskSeedReaderA single Python package can register multiple plugins. Simply define multiple Plugin instances and register each one as a separate entry point:
For an example of this pattern, see the end-to-end test plugins in the tests_e2e/ directory.
That’s it! You now know how to create a Data Designer plugin. A local editable install (uv pip install -e .) is all you need to develop, test, and use your plugin. If you want to make it available for others to install via pip install, publish it to PyPI or your organization’s package index.