> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/datadesigner/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/datadesigner/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/datadesigner/_mcp/server.

# data\_designer.config.column\_configs

## Module Contents

### Classes

| Name                                                                                       | Description                                                                       |
| ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------- |
| [`GenerationStrategy`](#data_designerconfigcolumn_configsgenerationstrategy)               | Strategy for custom column generation.                                            |
| [`SamplerColumnConfig`](#data_designerconfigcolumn_configssamplercolumnconfig)             | Configuration for columns generated using built-in samplers.                      |
| [`LLMTextColumnConfig`](#data_designerconfigcolumn_configsllmtextcolumnconfig)             | Configuration for text generation columns using Large Language Models.            |
| [`LLMCodeColumnConfig`](#data_designerconfigcolumn_configsllmcodecolumnconfig)             | Configuration for code generation columns using Large Language Models.            |
| [`LLMStructuredColumnConfig`](#data_designerconfigcolumn_configsllmstructuredcolumnconfig) | Configuration for structured JSON generation columns using Large Language Models. |
| [`Score`](#data_designerconfigcolumn_configsscore)                                         | Configuration for a "score" in an LLM judge evaluation.                           |
| [`LLMJudgeColumnConfig`](#data_designerconfigcolumn_configsllmjudgecolumnconfig)           | Configuration for LLM-as-a-judge quality assessment and scoring columns.          |
| [`ExpressionColumnConfig`](#data_designerconfigcolumn_configsexpressioncolumnconfig)       | Configuration for derived columns using Jinja2 expressions.                       |
| [`ValidationColumnConfig`](#data_designerconfigcolumn_configsvalidationcolumnconfig)       | Configuration for validation columns that validate existing columns.              |
| [`SeedDatasetColumnConfig`](#data_designerconfigcolumn_configsseeddatasetcolumnconfig)     | Configuration for columns sourced from seed datasets.                             |
| [`EmbeddingColumnConfig`](#data_designerconfigcolumn_configsembeddingcolumnconfig)         | Configuration for embedding generation columns.                                   |
| [`ImageColumnConfig`](#data_designerconfigcolumn_configsimagecolumnconfig)                 | Configuration for image generation columns.                                       |
| [`CustomColumnConfig`](#data_designerconfigcolumn_configscustomcolumnconfig)               | Configuration for custom user-defined column generators.                          |

### API

```python
class data_designer.config.column_configs.GenerationStrategy
```

**Bases**: `str`, `enum.Enum`

Strategy for custom column generation.

**Initialization:**

Initialize self.  See help(type(self)) for accurate signature.

```python
CELL_BY_CELL = cell_by_cell
```

```python
FULL_COLUMN = full_column
```

```python
class data_designer.config.column_configs.SamplerColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.SingleColumnConfig`

Configuration for columns generated using built-in samplers.

Sampler columns provide efficient data generation for common data types and
distributions. Supported samplers include UUID generation,
datetime/timedelta sampling, person generation, category / subcategory sampling,
and various statistical distributions (uniform, gaussian, binomial, poisson, scipy).

**Parameters:**

Type of sampler to use. Available types include:
"uuid", "category", "subcategory", "uniform", "gaussian", "bernoulli",
"bernoulli\_mixture", "binomial", "poisson", "scipy", "person",
"person\_from\_faker", "datetime", "timedelta".

Parameters specific to the chosen sampler type. Type varies based on the `sampler_type`
(e.g., `CategorySamplerParams`, `UniformSamplerParams`, `PersonSamplerParams`).

Optional dictionary for conditional parameters. The dict keys
are the conditions that must be met (e.g., "age > 21") for the conditional parameters
to be used. The values of dict are the parameters to use when the condition is met.

Optional type conversion to apply after sampling. For numerical samplers,
must be one of "float", "int", or "str". For datetime and timedelta samplers, accepts
a strftime format string (e.g., `"%Y-%m-%d"`, `"%m/%d/%Y %H:%M"`). When omitted,
datetime/timedelta columns default to ISO-8601 format (e.g., `2024-01-15T09:30:00`).

Inherited Attributes:
name (required): Unique name of the column to be generated.
drop: If True, generate this column but remove it from the final dataset.

!!! tip "Displaying available samplers and their parameters"
The config builder has an `info` attribute that can be used to display the
available samplers and their parameters:

```python
config_builder.info.display("samplers")
```

**Attributes:**

Type of sampler to use. Available types include:
"uuid", "category", "subcategory", "uniform", "gaussian", "bernoulli",
"bernoulli\_mixture", "binomial", "poisson", "scipy", "person",
"person\_from\_faker", "datetime", "timedelta".

Parameters specific to the chosen sampler type. Type varies based on the `sampler_type`
(e.g., `CategorySamplerParams`, `UniformSamplerParams`, `PersonSamplerParams`).

Optional dictionary for conditional parameters. The dict keys
are the conditions that must be met (e.g., "age > 21") for the conditional parameters
to be used. The values of dict are the parameters to use when the condition is met.

Optional type conversion to apply after sampling. For numerical samplers,
must be one of "float", "int", or "str". For datetime and timedelta samplers, accepts
a strftime format string (e.g., `"%Y-%m-%d"`, `"%m/%d/%Y %H:%M"`). When omitted,
datetime/timedelta columns default to ISO-8601 format (e.g., `2024-01-15T09:30:00`).

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
sampler_type: data_designer.config.sampler_params.SamplerType = Field(...)
```

```python
params: typing.Annotated[data_designer.config.sampler_params.SamplerParamsT, Discriminator('sampler_type')] = Field(...)
```

```python
conditional_params: dict[str, typing.Annotated[data_designer.config.sampler_params.SamplerParamsT, Discriminator('sampler_type')]] = Field(...)
```

```python
convert_to: str | None = Field(...)
```

```python
column_type: typing.Literal[sampler] = sampler
```

```python
get_column_emoji() -> str
```

```python
required_columns: list[str]
```

```python
side_effect_columns: list[str]
```

```python
inject_sampler_type_into_params(data: dict) -> dict
```

Inject sampler\_type into params dict to enable discriminated union resolution.

This allows users to pass params as a simple dict without the sampler\_type field,
which will be automatically added based on the outer sampler\_type field.

```python
class data_designer.config.column_configs.LLMTextColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.SingleColumnConfig`

Configuration for text generation columns using Large Language Models.

LLM text columns generate free-form text content using language models.
Prompts support Jinja2 templating to reference values from other columns, enabling
context-aware generation. The generated text can optionally include message traces
capturing the full conversation history.

**Parameters:**

Prompt template for text generation. Supports Jinja2 syntax to
reference other columns (e.g., "Write a story about \{\{ character\_name }}").
Must be a valid Jinja2 template.

Alias of the model configuration to use for generation.
Must match a model alias defined when initializing the DataDesignerConfigBuilder.

Optional system prompt to set model behavior and constraints.
Also supports Jinja2 templating. If provided, must be a valid Jinja2 template.
Do not put any output parsing instructions in the system prompt. Instead,
use the appropriate column type for the output you want to generate - e.g.,
`LLMStructuredColumnConfig` for structured output, `LLMCodeColumnConfig` for code.

Optional list of image contexts for multi-modal generation.
Enables vision-capable models to generate text based on image inputs.

Optional alias of the tool configuration to use for MCP tool calls.
Must match a tool alias defined when initializing the DataDesignerConfigBuilder.
When provided, the model may call permitted tools during generation.

Specifies what trace information to capture in a `\{column_name\}__trace`
column. Options are:

* `TraceType.NONE` (default): No trace is captured.
* `TraceType.LAST_MESSAGE`: Only the final assistant message is captured.
* `TraceType.ALL_MESSAGES`: Full conversation history (system/user/assistant/tool).

If True, creates a `\{column_name\}__reasoning_content` column
containing only the reasoning\_content from the final assistant response. This is
useful for models that expose chain-of-thought reasoning separately from the main
response. Defaults to False.

Inherited Attributes:
name (required): Unique name of the column to be generated.
drop: If True, generate this column but remove it from the final dataset.
**Attributes:**

Prompt template for text generation. Supports Jinja2 syntax to
reference other columns (e.g., "Write a story about \{\{ character\_name }}").
Must be a valid Jinja2 template.

Alias of the model configuration to use for generation.
Must match a model alias defined when initializing the DataDesignerConfigBuilder.

Optional system prompt to set model behavior and constraints.
Also supports Jinja2 templating. If provided, must be a valid Jinja2 template.
Do not put any output parsing instructions in the system prompt. Instead,
use the appropriate column type for the output you want to generate - e.g.,
`LLMStructuredColumnConfig` for structured output, `LLMCodeColumnConfig` for code.

Optional list of image contexts for multi-modal generation.
Enables vision-capable models to generate text based on image inputs.

Optional alias of the tool configuration to use for MCP tool calls.
Must match a tool alias defined when initializing the DataDesignerConfigBuilder.
When provided, the model may call permitted tools during generation.

Specifies what trace information to capture in a `\{column_name\}__trace`
column. Options are:

* `TraceType.NONE` (default): No trace is captured.
* `TraceType.LAST_MESSAGE`: Only the final assistant message is captured.
* `TraceType.ALL_MESSAGES`: Full conversation history (system/user/assistant/tool).

If True, creates a `\{column_name\}__reasoning_content` column
containing only the reasoning\_content from the final assistant response. This is
useful for models that expose chain-of-thought reasoning separately from the main
response. Defaults to False.

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
prompt: str = Field(...)
```

```python
model_alias: str = Field(...)
```

```python
system_prompt: str | None = Field(...)
```

```python
multi_modal_context: list[data_designer.config.models.ImageContext] | None = Field(...)
```

```python
tool_alias: str | None = Field(...)
```

```python
with_trace: data_designer.config.utils.trace_type.TraceType = Field(...)
```

```python
extract_reasoning_content: bool = Field(...)
```

```python
column_type: typing.Literal[llm-text] = llm-text
```

```python
get_column_emoji() -> str
```

```python
required_columns: list[str]
```

Get columns referenced in prompt templates and multi-modal context.

**Returns:**

`Any`

List of unique column names referenced in Jinja2 templates
and multi-modal context configurations.

```python
side_effect_columns: list[str]
```

Returns side-effect columns that may be generated alongside the main column.

Side-effect columns include:

* `{name}__trace`: Generated when `with_trace` is not `TraceType.NONE` on the column
  config.
* `{name}__reasoning_content`: Generated when `extract_reasoning_content=True`.

**Returns:**

`Any`

List of side-effect column names.

```python
assert_prompt_valid_jinja() -> typing_extensions.Self
```

Validate that prompt and system\_prompt are valid Jinja2 templates.

**Returns:**

`typing_extensions.Self`

The validated instance.

**Raises:**

If prompt or system\_prompt contains invalid Jinja2 syntax.

```python
class data_designer.config.column_configs.LLMCodeColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.column_configs.LLMTextColumnConfig`

Configuration for code generation columns using Large Language Models.

Extends LLMTextColumnConfig to generate code snippets in specific programming languages
or SQL dialects. The generated code is automatically extracted from markdown code blocks
for the specified language. Inherits all prompt templating capabilities from LLMTextColumnConfig.

**Parameters:**

Programming language or SQL dialect for code generation. Supported
values include: "python", "javascript", "typescript", "java", "kotlin", "go",
"rust", "ruby", "scala", "swift", "sql:sqlite", "sql:postgres", "sql:mysql",
"sql:tsql", "sql:bigquery", "sql:ansi". See CodeLang enum for complete list.

Inherited Attributes:
name (required): Unique name of the column to be generated.
prompt (required): Prompt template for code generation (supports Jinja2).
model\_alias (required): Alias of the model configuration to use.
system\_prompt: Optional system prompt (supports Jinja2).
multi\_modal\_context: Optional image contexts for multi-modal generation.
tool\_alias: Optional tool configuration alias for MCP tool calls.
with\_trace: Specifies what trace information to capture in a `{column_name}__trace`
column. Options are `TraceType.NONE` (default), `TraceType.LAST_MESSAGE`, or
`TraceType.ALL_MESSAGES`.
extract\_reasoning\_content: If True, creates a `{column_name}__reasoning_content`
column containing the reasoning content from the final assistant response.
drop: If True, generate this column but remove it from the final dataset.
**Attributes:**

Programming language or SQL dialect for code generation. Supported
values include: "python", "javascript", "typescript", "java", "kotlin", "go",
"rust", "ruby", "scala", "swift", "sql:sqlite", "sql:postgres", "sql:mysql",
"sql:tsql", "sql:bigquery", "sql:ansi". See CodeLang enum for complete list.

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
code_lang: data_designer.config.utils.code_lang.CodeLang = Field(...)
```

```python
column_type: typing.Literal[llm-code] = llm-code
```

```python
get_column_emoji() -> str
```

```python
class data_designer.config.column_configs.LLMStructuredColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.column_configs.LLMTextColumnConfig`

Configuration for structured JSON generation columns using Large Language Models.

Extends LLMTextColumnConfig to generate structured data conforming to a specified schema.
Uses JSON schema or Pydantic models to define the expected output structure, enabling
type-safe and validated structured output generation. Inherits prompt templating capabilities
from LLMTextColumnConfig.

**Parameters:**

The schema defining the expected output structure. Can be either:

* A Pydantic BaseModel class (recommended)
* A JSON schema dictionary

Inherited Attributes:
name (required): Unique name of the column to be generated.
prompt (required): Prompt template for structured generation (supports Jinja2).
model\_alias (required): Alias of the model configuration to use.
system\_prompt: Optional system prompt (supports Jinja2).
multi\_modal\_context: Optional image contexts for multi-modal generation.
tool\_alias: Optional tool configuration alias for MCP tool calls.
with\_trace: Specifies what trace information to capture in a `{column_name}__trace`
column. Options are `TraceType.NONE` (default), `TraceType.LAST_MESSAGE`, or
`TraceType.ALL_MESSAGES`.
extract\_reasoning\_content: If True, creates a `{column_name}__reasoning_content`
column containing the reasoning content from the final assistant response.
drop: If True, generate this column but remove it from the final dataset.
**Attributes:**

The schema defining the expected output structure. Can be either:

* A Pydantic BaseModel class (recommended)
* A JSON schema dictionary

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
output_format: dict | type[pydantic.BaseModel] = Field(...)
```

```python
column_type: typing.Literal[llm-structured] = llm-structured
```

```python
get_column_emoji() -> str
```

```python
validate_output_format() -> typing_extensions.Self
```

Convert Pydantic model to JSON schema if needed.

**Returns:**

`typing_extensions.Self`

The validated instance with output\_format as a JSON schema dict.

```python
class data_designer.config.column_configs.Score(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.ConfigBase`

Configuration for a "score" in an LLM judge evaluation.

Defines a single scoring criterion with its possible values and descriptions. Multiple
Score objects can be combined in an LLMJudgeColumnConfig to create multi-dimensional
quality assessments.

**Parameters:**

A clear, concise name for this scoring dimension (e.g., "Relevance", "Fluency").

An informative and detailed assessment guide explaining how to evaluate
this dimension. Should provide clear criteria for scoring.

Dictionary mapping score values to their descriptions. Keys can be integers
(e.g., 1-5 scale) or strings (e.g., "Poor", "Good", "Excellent"). Values are
descriptions explaining what each score level means.

**Attributes:**

A clear, concise name for this scoring dimension (e.g., "Relevance", "Fluency").

An informative and detailed assessment guide explaining how to evaluate
this dimension. Should provide clear criteria for scoring.

Dictionary mapping score values to their descriptions. Keys can be integers
(e.g., 1-5 scale) or strings (e.g., "Poor", "Good", "Excellent"). Values are
descriptions explaining what each score level means.

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
name: str = Field(...)
```

```python
description: str = Field(...)
```

```python
options: dict[int | str, str] = Field(...)
```

```python
class data_designer.config.column_configs.LLMJudgeColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.column_configs.LLMTextColumnConfig`

Configuration for LLM-as-a-judge quality assessment and scoring columns.

Extends LLMTextColumnConfig to create judge columns that evaluate and score other
generated content based on the defined criteria. Useful for quality assessment, preference
ranking, and multi-dimensional evaluation of generated data. Inherits prompt templating
capabilities from LLMTextColumnConfig.

**Parameters:**

List of Score objects defining the evaluation dimensions. Each score
represents a different aspect to evaluate (e.g., accuracy, relevance, fluency).
Must contain at least one score.

Inherited Attributes:
name (required): Unique name of the column to be generated.
prompt (required): Prompt template for the judge evaluation (supports Jinja2).
model\_alias (required): Alias of the model configuration to use.
system\_prompt: Optional system prompt (supports Jinja2).
multi\_modal\_context: Optional image contexts for multi-modal generation.
tool\_alias: Optional tool configuration alias for MCP tool calls.
with\_trace: Specifies what trace information to capture in a `{column_name}__trace`
column. Options are `TraceType.NONE` (default), `TraceType.LAST_MESSAGE`, or
`TraceType.ALL_MESSAGES`.
extract\_reasoning\_content: If True, creates a `{column_name}__reasoning_content`
column containing the reasoning content from the final assistant response.
drop: If True, generate this column but remove it from the final dataset.
**Attributes:**

List of Score objects defining the evaluation dimensions. Each score
represents a different aspect to evaluate (e.g., accuracy, relevance, fluency).
Must contain at least one score.

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
scores: list[data_designer.config.column_configs.Score] = Field(...)
```

```python
column_type: typing.Literal[llm-judge] = llm-judge
```

```python
get_column_emoji() -> str
```

```python
class data_designer.config.column_configs.ExpressionColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.SingleColumnConfig`

Configuration for derived columns using Jinja2 expressions.

Expression columns compute values by evaluating Jinja2 templates that reference other
columns. Useful for transformations, concatenations, conditional logic, and derived
features without requiring LLM generation. The expression is evaluated row-by-row.

**Parameters:**

Jinja2 expression to evaluate. Can reference other column values using
\{\{ column\_name }} syntax. Supports filters, conditionals, and arithmetic.
Must be a valid, non-empty Jinja2 template.

Data type to cast the result to. Must be one of "int", "float", "str", or "bool".
Defaults to "str". Type conversion is applied after expression evaluation.

Inherited Attributes:
name (required): Unique name of the column to be generated.
drop: If True, generate this column but remove it from the final dataset.
**Attributes:**

Jinja2 expression to evaluate. Can reference other column values using
\{\{ column\_name }} syntax. Supports filters, conditionals, and arithmetic.
Must be a valid, non-empty Jinja2 template.

Data type to cast the result to. Must be one of "int", "float", "str", or "bool".
Defaults to "str". Type conversion is applied after expression evaluation.

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
expr: str = Field(...)
```

```python
dtype: typing.Literal[int, float, str, bool] = Field(...)
```

```python
column_type: typing.Literal[expression] = expression
```

```python
get_column_emoji() -> str
```

```python
required_columns: list[str]
```

Returns the columns referenced in the expression template.

```python
side_effect_columns: list[str]
```

```python
_DTYPE_COERCERS: dict[str, type]
```

```python
_assert_expression_valid_jinja() -> typing_extensions.Self
```

Validate that the expression is a valid, non-empty Jinja2 template.

**Returns:**

`typing_extensions.Self`

The validated instance.

**Raises:**

If expression is empty or contains invalid Jinja2 syntax.

```python
_coerce_skip_value_to_dtype() -> typing_extensions.Self
```

Coerce `skip.value` to match `dtype` so skipped and computed rows share a type.

```python
class data_designer.config.column_configs.ValidationColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.SingleColumnConfig`

Configuration for validation columns that validate existing columns.

Validation columns execute validation logic against specified target columns and return
structured results indicating pass/fail status with validation details. Supports multiple
validation strategies: code execution (Python/SQL), local callable functions (library only),
and remote HTTP endpoints.

**Parameters:**

List of column names to validate. These columns are passed to the
validator for validation. All target columns must exist in the dataset
before validation runs.

The type of validator to use. Options:

* "code": Execute code (Python or SQL) for validation. The code receives a
  DataFrame with target columns and must return a DataFrame with validation results.
* "local\_callable": Call a local Python function with the data. Only supported
  when running DataDesigner locally.
* "remote": Send data to a remote HTTP endpoint for validation.

Parameters specific to the validator type. Type varies by validator:

* CodeValidatorParams: Specifies code language (python or SQL dialect like
  "sql:postgres", "sql:mysql").
* LocalCallableValidatorParams: Provides validation function (Callable\[\[pd.DataFrame],
  pd.DataFrame]) and optional output schema for validation results.
* RemoteValidatorParams: Configures endpoint URL, HTTP timeout, retry behavior
  (max\_retries, retry\_backoff), and parallel request limits (max\_parallel\_requests).

Number of records to process in each validation batch. Defaults to 10.
Larger batches are more efficient but use more memory. Adjust based on validator
complexity and available resources.

Inherited Attributes:
name (required): Unique name of the column to be generated.
drop: If True, generate this column but remove it from the final dataset.
**Attributes:**

List of column names to validate. These columns are passed to the
validator for validation. All target columns must exist in the dataset
before validation runs.

The type of validator to use. Options:

* "code": Execute code (Python or SQL) for validation. The code receives a
  DataFrame with target columns and must return a DataFrame with validation results.
* "local\_callable": Call a local Python function with the data. Only supported
  when running DataDesigner locally.
* "remote": Send data to a remote HTTP endpoint for validation.

Parameters specific to the validator type. Type varies by validator:

* CodeValidatorParams: Specifies code language (python or SQL dialect like
  "sql:postgres", "sql:mysql").
* LocalCallableValidatorParams: Provides validation function (Callable\[\[pd.DataFrame],
  pd.DataFrame]) and optional output schema for validation results.
* RemoteValidatorParams: Configures endpoint URL, HTTP timeout, retry behavior
  (max\_retries, retry\_backoff), and parallel request limits (max\_parallel\_requests).

Number of records to process in each validation batch. Defaults to 10.
Larger batches are more efficient but use more memory. Adjust based on validator
complexity and available resources.

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
target_columns: list[str] = Field(...)
```

```python
validator_type: data_designer.config.validator_params.ValidatorType = Field(...)
```

```python
validator_params: typing.Annotated[data_designer.config.validator_params.ValidatorParamsT, Discriminator('validator_type')] = Field(...)
```

```python
batch_size: int = Field(...)
```

```python
column_type: typing.Literal[validation] = validation
```

```python
get_column_emoji() -> str
```

```python
required_columns: list[str]
```

Returns the columns that need to be validated.

```python
side_effect_columns: list[str]
```

```python
inject_validator_type_into_params(data: dict) -> dict
```

Inject validator\_type into validator\_params for discriminated union resolution.

```python
class data_designer.config.column_configs.SeedDatasetColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.SingleColumnConfig`

Configuration for columns sourced from seed datasets.

This config marks columns that come from seed data. It is typically created
automatically when calling `with_seed_dataset()` on the builder, rather than
being instantiated directly by users.

Inherited Attributes:
name (required): Unique name of the column to be generated.
drop: If True, generate this column but remove it from the final dataset.

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
column_type: typing.Literal[seed-dataset] = seed-dataset
```

```python
get_column_emoji() -> str
```

```python
required_columns: list[str]
```

```python
side_effect_columns: list[str]
```

```python
class data_designer.config.column_configs.EmbeddingColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.SingleColumnConfig`

Configuration for embedding generation columns.

Embedding columns generate embeddings for text input using a specified model.

**Parameters:**

The column to generate embeddings for. The column could be a single text string or a list of text strings in stringified JSON format.
If it is a list of text strings in stringified JSON format, the embeddings will be generated for each text string.

The model to use for embedding generation.

Inherited Attributes:
name (required): Unique name of the column to be generated.
drop: If True, generate this column but remove it from the final dataset.
**Attributes:**

The column to generate embeddings for. The column could be a single text string or a list of text strings in stringified JSON format.
If it is a list of text strings in stringified JSON format, the embeddings will be generated for each text string.

The model to use for embedding generation.

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
target_column: str = Field(...)
```

```python
model_alias: str = Field(...)
```

```python
column_type: typing.Literal[embedding] = embedding
```

```python
get_column_emoji() -> str
```

```python
required_columns: list[str]
```

```python
side_effect_columns: list[str]
```

```python
class data_designer.config.column_configs.ImageColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.SingleColumnConfig`

Configuration for image generation columns.

Image columns generate images using either autoregressive or diffusion models.
The API used is automatically determined based on the model name:

**Parameters:**

Prompt template for image generation. Supports Jinja2 templating to
reference other columns (e.g., "Generate an image of a \{\{ character\_name }}").
Must be a valid Jinja2 template.

The model to use for image generation.

Optional list of image contexts for multi-modal generation.
Enables autoregressive multi-modal models to generate images based on image inputs.
Only works with autoregressive models that support image-to-image generation.

Inherited Attributes:
name (required): Unique name of the column to be generated.
drop: If True, generate this column but remove it from the final dataset.
**Attributes:**

Prompt template for image generation. Supports Jinja2 templating to
reference other columns (e.g., "Generate an image of a \{\{ character\_name }}").
Must be a valid Jinja2 template.

The model to use for image generation.

Optional list of image contexts for multi-modal generation.
Enables autoregressive multi-modal models to generate images based on image inputs.
Only works with autoregressive models that support image-to-image generation.

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
prompt: str = Field(...)
```

```python
model_alias: str = Field(...)
```

```python
multi_modal_context: list[data_designer.config.models.ImageContext] | None = Field(...)
```

```python
column_type: typing.Literal[image] = image
```

```python
get_column_emoji() -> str
```

```python
required_columns: list[str]
```

Get columns referenced in the prompt template and multi-modal context.

**Returns:**

`Any`

List of unique column names referenced in Jinja2 templates
and multi-modal context configurations.

```python
assert_prompt_valid_jinja() -> typing_extensions.Self
```

Validate that prompt is a valid Jinja2 template.

**Returns:**

`typing_extensions.Self`

The validated instance.

**Raises:**

If prompt contains invalid Jinja2 syntax.

```python
side_effect_columns: list[str]
```

```python
class data_designer.config.column_configs.CustomColumnConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.SingleColumnConfig`

Configuration for custom user-defined column generators.

Custom columns allow users to provide their own generation logic via a callable function
decorated with `@custom_column_generator`. Two strategies are supported: cell\_by\_cell
(default, row-based) and full\_column (batch-based with DataFrame access).

**Parameters:**

A callable decorated with @custom\_column\_generator.

"cell\_by\_cell" (row-based) or "full\_column" (batch-based).

Optional typed configuration object (Pydantic BaseModel) passed
as the second argument to the generator function.

Inherited Attributes:
name (required): Unique name of the column to be generated.
drop: If True, generate this column but remove it from the final dataset.
**Attributes:**

A callable decorated with @custom\_column\_generator.

"cell\_by\_cell" (row-based) or "full\_column" (batch-based).

Optional typed configuration object (Pydantic BaseModel) passed
as the second argument to the generator function.

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
generator_function: typing.Any = Field(...)
```

```python
generation_strategy: data_designer.config.column_configs.GenerationStrategy = Field(...)
```

```python
generator_params: pydantic.BaseModel | None = Field(...)
```

```python
column_type: typing.Literal[custom] = custom
```

```python
_validate_generator_function(v: typing.Any) -> typing.Any
```

```python
get_column_emoji() -> str
```

```python
required_columns: list[str]
```

Returns the columns required for custom generation (from decorator metadata).

```python
side_effect_columns: list[str]
```

Returns additional columns created by this generator (from decorator metadata).

```python
model_aliases: list[str]
```

Returns model aliases for LLM access and health checks (from decorator metadata).

```python
get_model_aliases() -> list[str]
```

Returns the decorator-declared aliases so the startup health check pings every endpoint.

```python
serialize_generator_function(v: typing.Any) -> str
```

```python
serialize_generator_params(v: pydantic.BaseModel | None) -> dict[str, typing.Any] | None
```

```python
validate_generator_function() -> typing_extensions.Self
```