For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
  • Dev Notes
    • Overview
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
    • Data Designer Got Skills
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Example Plugin: Column Generator
  • Column Generator Plugin: Index Multiplier
  • Step 1: Create a Python package
  • Step 2: Create the config class
  • Step 3: Create the implementation class
  • Step 4: Create the plugin object
  • Step 5: Package your plugin
  • Step 6: Install and use your plugin locally
  • Validating Your Plugin
  • Multiple Plugins in One Package
Plugins

Example Plugin: Column Generator

||View as Markdown|
Previous

Data Designer Plugins

Next

FileSystemSeedReader Plugins

Experimental in this version Plugins were experimental in v0.5.8 and v0.5.9. For stable plugin docs, see the v0.6.0 plugin docs.

Example Plugin: Column Generator

Data Designer supports three plugin types: column generators, seed readers, and processors. This page walks through a complete column generator example. For filesystem-backed seed reader plugins, see FileSystemSeedReader Plugins.

A Data Designer plugin is implemented as a Python package with three main components:

  1. Configuration Class: Defines the parameters users can configure
  2. Implementation Class: Contains the core logic of the plugin
  3. Plugin Object: Connects the config and implementation classes to make the plugin discoverable

We recommend separating these into individual files (config.py, impl.py, plugin.py) within a plugin subdirectory. This keeps the code organized, makes it easy to test each component independently, and guards against circular dependencies — since the config module can be imported without pulling in the engine-level implementation classes, and the plugin object can be discovered without importing either.


Column Generator Plugin: Index Multiplier

In this section, we will build a simple column generator plugin that generates values by multiplying the row index by a user-specified multiplier.

Step 1: Create a Python package

We recommend the following structure for column generator plugins:

data-designer-index-multiplier/
├── pyproject.toml
└── src/
└── data_designer_index_multiplier/
├── __init__.py
├── config.py
├── impl.py
└── plugin.py

Step 2: Create the config class

The configuration class defines what parameters users can set when using your plugin. For column generator plugins, it must inherit from SingleColumnConfig and include a discriminator field.

Create src/data_designer_index_multiplier/config.py:

1from typing import Literal
2
3from data_designer.config.base import SingleColumnConfig
4
5
6class IndexMultiplierColumnConfig(SingleColumnConfig):
7 """Configuration for the index multiplier column generator."""
8
9 # Required: discriminator field with a unique Literal type
10 # This value identifies your plugin and becomes its column_type
11 column_type: Literal["index-multiplier"] = "index-multiplier"
12
13 # Configurable parameter for this plugin
14 multiplier: int = 2
15
16 @staticmethod
17 def get_column_emoji() -> str:
18 return "✖️"
19
20 @property
21 def required_columns(self) -> list[str]:
22 """Columns that must exist before this generator runs."""
23 return []
24
25 @property
26 def side_effect_columns(self) -> list[str]:
27 """Additional columns produced beyond the primary column."""
28 return []

Key points:

  • The column_type field must be a Literal type with a string default
  • This value uniquely identifies your plugin (use kebab-case)
  • Add any custom parameters your plugin needs (here: multiplier)
  • SingleColumnConfig is a Pydantic model, so you can leverage all of Pydantic’s validation features
  • get_column_emoji() returns the emoji displayed in logs for this column type
  • required_columns lists any columns this generator depends on (empty if none)
  • side_effect_columns lists any additional columns this generator produces beyond the primary column (empty if none)

If your plugin can expand or retract the number of rows (1:N or N:1): set allow_resize=True in the config class so the pipeline updates batch bookkeeping correctly. For example:

1class MyColumnConfig(SingleColumnConfig):
2 column_type: Literal["my-plugin"] = "my-plugin"
3 allow_resize: bool = True # required when output row count can differ from input
4 # ...

The default is False; only set it to True when your generate method can return more or fewer rows than it receives.

Step 3: Create the implementation class

The implementation class defines the actual business logic of the plugin. For column generator plugins, inherit from ColumnGeneratorFullColumn or ColumnGeneratorCellByCell and implement the generate method.

Create src/data_designer_index_multiplier/impl.py:

1import logging
2
3import pandas as pd
4from data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn
5
6from data_designer_index_multiplier.config import IndexMultiplierColumnConfig
7
8logger = logging.getLogger(__name__)
9
10
11class IndexMultiplierColumnGenerator(ColumnGeneratorFullColumn[IndexMultiplierColumnConfig]):
12
13 def generate(self, data: pd.DataFrame) -> pd.DataFrame:
14 """Generate the column data.
15
16 Args:
17 data: The current DataFrame being built
18
19 Returns:
20 The DataFrame with the new column added
21 """
22 logger.info(
23 f"Generating column {self.config.name} "
24 f"with multiplier {self.config.multiplier}"
25 )
26
27 data[self.config.name] = data.index * self.config.multiplier
28
29 return data

Key points:

  • Generic type ColumnGeneratorFullColumn[IndexMultiplierColumnConfig] connects the implementation to its config
  • You have access to the configuration parameters via self.config

Understanding generation_strategy The generation_strategy specifies how the column generator will generate data. You choose a strategy by inheriting from the corresponding base class:

  • ColumnGeneratorFullColumn: Generates the full column (at the batch level) in a single call to generate

    • generate must take as input a pd.DataFrame with all previous columns and return a pd.DataFrame with the generated column appended.
  • ColumnGeneratorCellByCell: Generates one cell at a time

    • generate must take as input a dict with key/value pairs for all previous columns and return a dict with an additional key/value for the generated cell
    • Supports concurrent workers via a max_parallel_requests parameter on the configuration

Step 4: Create the plugin object

Create a Plugin object that makes the plugin discoverable and connects the implementation and config classes.

Create src/data_designer_index_multiplier/plugin.py:

1from data_designer.plugins import Plugin, PluginType
2
3plugin = Plugin(
4 config_qualified_name="data_designer_index_multiplier.config.IndexMultiplierColumnConfig",
5 impl_qualified_name="data_designer_index_multiplier.impl.IndexMultiplierColumnGenerator",
6 plugin_type=PluginType.COLUMN_GENERATOR,
7)

Step 5: Package your plugin

Create a pyproject.toml file to define your package and register the entry point:

1[project]
2name = "data-designer-index-multiplier"
3version = "1.0.0"
4description = "Data Designer index multiplier plugin"
5requires-python = ">=3.10"
6dependencies = [
7 "data-designer",
8]
9
10# Register this plugin via entry points
11[project.entry-points."data_designer.plugins"]
12index-multiplier = "data_designer_index_multiplier.plugin:plugin"
13
14[build-system]
15requires = ["hatchling"]
16build-backend = "hatchling.build"
17
18[tool.hatch.build.targets.wheel]
19packages = ["src/data_designer_index_multiplier"]

Entry Point Registration Plugins are discovered automatically using Python entry points. It is important to register your plugin as an entry point under the data_designer.plugins group.

The entry point format is:

1[project.entry-points."data_designer.plugins"]
2<entry-point-name> = "<module.path>:<plugin-instance-name>"

Step 6: Install and use your plugin locally

Install your plugin in editable mode — this is all you need to start using it. No PyPI publishing required:

$# From the plugin directory
$uv pip install -e .

That’s it. The editable install registers the entry point so Data Designer discovers your plugin automatically. Any changes you make to the plugin source code are picked up immediately without reinstalling.

Once installed, your plugin works just like built-in column types:

1import data_designer.config as dd
2from data_designer.interface import DataDesigner
3
4from data_designer_index_multiplier.config import IndexMultiplierColumnConfig
5
6data_designer = DataDesigner()
7builder = dd.DataDesignerConfigBuilder()
8
9# Add a regular column
10builder.add_column(
11 dd.SamplerColumnConfig(
12 name="category",
13 sampler_type="category",
14 params=dd.CategorySamplerParams(values=["A", "B", "C"]),
15 )
16)
17
18# Add your custom plugin column
19builder.add_column(
20 IndexMultiplierColumnConfig(
21 name="scaled_index",
22 multiplier=5,
23 )
24)
25
26# Generate data
27results = data_designer.create(builder, num_records=10)
28print(results.load_dataset())

Output:

category scaled_index
0 B 0
1 A 5
2 C 10
3 A 15
4 B 20
...

Validating Your Plugin

Data Designer provides a testing utility to validate that your plugin is structured correctly. Use assert_valid_plugin to check that your config and implementation classes are properly defined:

1from data_designer.engine.testing.utils import assert_valid_plugin
2from data_designer_index_multiplier.plugin import plugin
3
4# Raises AssertionError with a descriptive message if anything is wrong with the general plugin structure
5assert_valid_plugin(plugin)

This validates that:

  • The config class is a subclass of ConfigBase
  • For column generator plugins: the implementation class is a subclass of ConfigurableTask
  • For seed reader plugins: the implementation class is a subclass of SeedReader

Multiple Plugins in One Package

A single Python package can register multiple plugins. Simply define multiple Plugin instances and register each one as a separate entry point:

1[project.entry-points."data_designer.plugins"]
2my-column-generator = "my_package.plugins.column_generator.plugin:column_generator_plugin"
3my-seed-reader = "my_package.plugins.seed_reader.plugin:seed_reader_plugin"

For an example of this pattern, see the end-to-end test plugins in the tests_e2e/ directory.

That’s it! You now know how to create a Data Designer plugin. A local editable install (uv pip install -e .) is all you need to develop, test, and use your plugin. If you want to make it available for others to install via pip install, publish it to PyPI or your organization’s package index.