data_designer.engine.column_generators.generators.base
data_designer.engine.column_generators.generators.base
data_designer.engine.column_generators.generators.base
Run an async coroutine from sync context.
asyncio.run(coro)Bases: data_designer.engine.configurable_task.ConfigurableTask[data_designer.engine.configurable_task.TaskConfigT], abc.ABC
Whether this generator makes model/API calls during generation.
Whether this generator’s output depends on prior row-group calls.
Example: SeedDatasetColumnGenerator tracks its position in the seed dataset, so row group N must complete before N+1 starts.
Check if a subclass has overridden a base ColumnGenerator method.
Sync generate — overridden by most concrete generators.
Default bridges to agenerate() for async-first subclasses that only
implement agenerate(). Raises NotImplementedError if neither
generate() nor agenerate() is overridden.
Async generate — delegates to sync generate() via thread pool.
Subclasses with native async support (e.g. ColumnGeneratorWithModelChatCompletion) should override this with a direct async implementation.
A shared method to log info before the generator’s generate method is called.
The idea is for dataset builders to call this method for all generators before calling their
generate method. This is to avoid logging the same information multiple times when running
generators in parallel.
Bases: data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT], abc.ABC
Async wrapper — wraps sync generate_from_scratch() in a thread.
Bases: data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT], abc.ABC
Bases: data_designer.engine.column_generators.generators.base.ColumnGeneratorWithModelRegistry[data_designer.engine.configurable_task.TaskConfigT], abc.ABC
Build multi-modal context from the config’s multi_modal_context list.
Passes base_path to get_contexts() so that generated image file paths (stored under base_dataset_path in create mode) can be resolved to base64 before being sent to the model endpoint.
Parameters:
The deserialized record containing column values.
Returns:
list[dict[str, typing.Any]] | None
A list of multi-modal context dicts, or None if no context is configured.
Bases: data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT], abc.ABC
Base class for column generators invoked once per row.
Override generate to return the complete row mapping after adding the
generated value. The engine calls the generator once per row and may run
calls concurrently. Use this base when generation is independent per row
(e.g. an LLM call per row, a per-row transform).
Generate one row’s output from a single row’s upstream values.
Parameters:
Current row mapping containing the upstream values available to this column.
Returns:
dict
Complete row mapping with existing keys preserved and the new column value added. Include declared side-effect columns when the config creates them.
Bases: data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT], abc.ABC
Base class for column generators that transform a full batch at once.
Override generate to return the complete batch DataFrame after adding
generated values. Use this base when generation is vectorizable or when an
external API accepts batched input more efficiently than per-row calls.
Generate an entire batch of row outputs.
Parameters:
DataFrame containing the upstream columns this generator depends on.
Returns:
pandas.DataFrame
DataFrame containing the input columns plus the new column and any side-effect
columns. When config.allow_resize is False, the row count must match
the input; when it is True, the row count may change.