> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/datadesigner/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/datadesigner/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/datadesigner/_mcp/server.

# data\_designer.engine.column\_generators.generators.base

## Module Contents

### Classes

| Name                                                                                                                      | Description                                                                   |
| ------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| [`ColumnGenerator`](#data_designerenginecolumn_generatorsgeneratorsbasecolumngenerator)                                   | Helper class that provides a standard way to create an ABC using inheritance. |
| [`FromScratchColumnGenerator`](#data_designerenginecolumn_generatorsgeneratorsbasefromscratchcolumngenerator)             | Helper class that provides a standard way to create an ABC using inheritance. |
| [`ColumnGeneratorWithModelRegistry`](#data_designerenginecolumn_generatorsgeneratorsbasecolumngeneratorwithmodelregistry) | Helper class that provides a standard way to create an ABC using inheritance. |
| [`ColumnGeneratorWithModel`](#data_designerenginecolumn_generatorsgeneratorsbasecolumngeneratorwithmodel)                 | Helper class that provides a standard way to create an ABC using inheritance. |
| [`ColumnGeneratorCellByCell`](#data_designerenginecolumn_generatorsgeneratorsbasecolumngeneratorcellbycell)               | Base class for column generators invoked once per row.                        |
| [`ColumnGeneratorFullColumn`](#data_designerenginecolumn_generatorsgeneratorsbasecolumngeneratorfullcolumn)               | Base class for column generators that transform a full batch at once.         |

### Functions

| Name                                                                                            | Description                               |
| ----------------------------------------------------------------------------------------------- | ----------------------------------------- |
| [`_run_coroutine_sync`](#data_designerenginecolumn_generatorsgeneratorsbase_run_coroutine_sync) | Run an async coroutine from sync context. |

### Data

[`_T`](#data_designerenginecolumn_generatorsgeneratorsbase_t)
[`SYNC_BRIDGE_TIMEOUT`](#data_designerenginecolumn_generatorsgeneratorsbasesync_bridge_timeout)
[`logger`](#data_designerenginecolumn_generatorsgeneratorsbaselogger)

### API

```python
_T = TypeVar(...)
```

```python
SYNC_BRIDGE_TIMEOUT = 300
```

```python
logger = getLogger(...)
```

```python
data_designer.engine.column_generators.generators.base._run_coroutine_sync(coro: typing.Coroutine[typing.Any, typing.Any, data_designer.engine.column_generators.generators.base._T]) -> data_designer.engine.column_generators.generators.base._T
```

Run an async coroutine from sync context.

* No running event loop → `asyncio.run(coro)`
* Running event loop (e.g. notebook/service) → run in a background thread

```python
class data_designer.engine.column_generators.generators.base.ColumnGenerator(
    config: data_designer.engine.configurable_task.TaskConfigT,
    resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
)
```

**Bases**: `data_designer.engine.configurable_task.ConfigurableTask[data_designer.engine.configurable_task.TaskConfigT]`, `abc.ABC`

```python
can_generate_from_scratch: bool
```

```python
is_llm_bound: bool
```

Whether this generator makes model/API calls during generation.

```python
is_order_dependent: bool
```

Whether this generator's output depends on prior row-group calls.

Example: SeedDatasetColumnGenerator tracks its position in the seed
dataset, so row group N must complete before N+1 starts.

```python
_is_overridden(method_name: str) -> bool
```

Check if a subclass has overridden a base ColumnGenerator method.

```python
get_generation_strategy() -> data_designer.config.column_configs.GenerationStrategy
```

```python
generate(data: data_designer.engine.configurable_task.DataT) -> data_designer.engine.configurable_task.DataT
```

Sync generate — overridden by most concrete generators.

Default bridges to `agenerate()` for async-first subclasses that only
implement `agenerate()`. Raises `NotImplementedError` if neither
`generate()` nor `agenerate()` is overridden.

```python
agenerate(data: data_designer.engine.configurable_task.DataT) -> data_designer.engine.configurable_task.DataT
```

Async generate — delegates to sync `generate()` via thread pool.

Subclasses with native async support (e.g. ColumnGeneratorWithModelChatCompletion)
should override this with a direct async implementation.

```python
log_pre_generation() -> None
```

A shared method to log info before the generator's `generate` method is called.

The idea is for dataset builders to call this method for all generators before calling their
`generate` method. This is to avoid logging the same information multiple times when running
generators in parallel.

```python
class data_designer.engine.column_generators.generators.base.FromScratchColumnGenerator(
    config: data_designer.engine.configurable_task.TaskConfigT,
    resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
)
```

**Bases**: `data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT]`, `abc.ABC`

```python
can_generate_from_scratch: bool
```

```python
generate_from_scratch(num_records: int) -> pandas.DataFrame
```

```python
agenerate_from_scratch(num_records: int) -> pandas.DataFrame
```

Async wrapper — wraps sync `generate_from_scratch()` in a thread.

```python
class data_designer.engine.column_generators.generators.base.ColumnGeneratorWithModelRegistry(
    config: data_designer.engine.configurable_task.TaskConfigT,
    resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
)
```

**Bases**: `data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT]`, `abc.ABC`

```python
is_llm_bound: bool
```

```python
model_registry: data_designer.engine.models.registry.ModelRegistry
```

```python
get_model(model_alias: str) -> data_designer.engine.models.facade.ModelFacade
```

```python
get_model_config(model_alias: str) -> data_designer.config.models.ModelConfig
```

```python
get_model_provider_name(model_alias: str) -> str
```

```python
class data_designer.engine.column_generators.generators.base.ColumnGeneratorWithModel(
    config: data_designer.engine.configurable_task.TaskConfigT,
    resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
)
```

**Bases**: `data_designer.engine.column_generators.generators.base.ColumnGeneratorWithModelRegistry[data_designer.engine.configurable_task.TaskConfigT]`, `abc.ABC`

```python
model() -> data_designer.engine.models.facade.ModelFacade
```

```python
model_config() -> data_designer.config.models.ModelConfig
```

```python
inference_parameters() -> data_designer.config.models.BaseInferenceParams
```

```python
_build_multi_modal_context(record: dict) -> list[dict[str, typing.Any]] | None
```

Build multi-modal context from the config's multi\_modal\_context list.

Passes base\_path to get\_contexts() so that generated image file paths
(stored under base\_dataset\_path in create mode) can be resolved to base64
before being sent to the model endpoint.

**Parameters:**

The deserialized record containing column values.

**Returns:**

`list[dict[str, typing.Any]] | None`

A list of multi-modal context dicts, or None if no context is configured.

```python
log_pre_generation() -> None
```

```python
class data_designer.engine.column_generators.generators.base.ColumnGeneratorCellByCell(
    config: data_designer.engine.configurable_task.TaskConfigT,
    resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
)
```

**Bases**: `data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT]`, `abc.ABC`

Base class for column generators invoked once per row.

Override `generate` to return the complete row mapping after adding the
generated value. The engine calls the generator once per row and may run
calls concurrently. Use this base when generation is independent per row
(e.g. an LLM call per row, a per-row transform).

```python
get_generation_strategy() -> data_designer.config.column_configs.GenerationStrategy
```

```python
generate(data: dict) -> dict
```

Generate one row's output from a single row's upstream values.

**Parameters:**

Current row mapping containing the upstream values available to this column.

**Returns:**

`dict`

Complete row mapping with existing keys preserved and the new column value added.
Include declared side-effect columns when the config creates them.

```python
class data_designer.engine.column_generators.generators.base.ColumnGeneratorFullColumn(
    config: data_designer.engine.configurable_task.TaskConfigT,
    resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
)
```

**Bases**: `data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT]`, `abc.ABC`

Base class for column generators that transform a full batch at once.

Override `generate` to return the complete batch DataFrame after adding
generated values. Use this base when generation is vectorizable or when an
external API accepts batched input more efficiently than per-row calls.

```python
get_generation_strategy() -> data_designer.config.column_configs.GenerationStrategy
```

```python
generate(data: pandas.DataFrame) -> pandas.DataFrame
```

Generate an entire batch of row outputs.

**Parameters:**

DataFrame containing the upstream columns this generator depends on.

**Returns:**

`pandas.DataFrame`

DataFrame containing the input columns plus the new column and any side-effect
columns. When `config.allow_resize` is `False`, the row count must match
the input; when it is `True`, the row count may change.