data_designer.config.analysis.column_statistics

Module Contents

Classes

Name	Description
`MissingValue`	str(object=”) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
`ColumnDistributionType`	str(object=”) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
`BaseColumnStatistics`	Abstract base class for all column statistics types.
`GeneralColumnStatistics`	Container for general statistics applicable to all column types.
`LLMTextColumnStatistics`	Container for statistics on LLM-generated text columns.
`LLMCodeColumnStatistics`	Container for statistics on LLM-generated code columns.
`LLMStructuredColumnStatistics`	Container for statistics on LLM-generated structured JSON columns.
`LLMJudgedColumnStatistics`	Container for statistics on LLM-as-a-judge quality assessment columns.
`SamplerColumnStatistics`	Container for statistics on sampler-generated columns.
`SeedDatasetColumnStatistics`	Container for statistics on columns sourced from seed datasets.
`ExpressionColumnStatistics`	Container for statistics on expression-based derived columns.
`ValidationColumnStatistics`	Container for statistics on validation result columns.
`CategoricalHistogramData`	Container for categorical distribution histogram data.
`CategoricalDistribution`	Container for computed categorical distribution statistics.
`NumericalDistribution`	Container for computed numerical distribution statistics.

Data

ColumnStatisticsT DEFAULT_COLUMN_STATISTICS_MAP

API

1 class data_designer.config.analysis.column_statistics.MissingValue

Bases: str, enum.Enum

1 CALCULATION_FAILED = --

1 OUTPUT_FORMAT_ERROR = output_format_error

1 class data_designer.config.analysis.column_statistics.ColumnDistributionType

Bases: str, enum.Enum

1 CATEGORICAL = categorical

1 NUMERICAL = numerical

1 TEXT = text

1 OTHER = other

1 UNKNOWN = unknown

1 class data_designer.config.analysis.column_statistics.BaseColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: pydantic.BaseModel, abc.ABC

Abstract base class for all column statistics types.

Serves as a container for computed statistics across different column types in Data-Designer-generated datasets. Subclasses hold column-specific statistical results and provide methods for formatting these results for display in reports.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 model_config = ConfigDict(...)

1 create_report_row_data() -> dict[str, str]

Creates a formatted dictionary of statistics for display in reports.

Returns:

dict[str, str]

Dictionary mapping display labels to formatted statistic values.

1 class data_designer.config.analysis.column_statistics.GeneralColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.BaseColumnStatistics

Container for general statistics applicable to all column types.

Holds core statistical measures that apply universally across all column types, including null counts, unique values, and data type information. Serves as the base for more specialized column statistics classes that store additional column-specific metrics.

Parameters:

column_name

Name of the column being analyzed.

num_records

Total number of records in the column.

num_null

Number of null/missing values in the column.

num_unique

Number of distinct values in the column. If a value is not hashable, it is converted to a string.

pyarrow_dtype

PyArrow data type of the column as a string.

simple_dtype

Simplified human-readable data type label.

column_type

Discriminator field, always “general” for this statistics type.

Attributes:

column_name

Name of the column being analyzed.

num_records

Total number of records in the column.

num_null

Number of null/missing values in the column.

num_unique

Number of distinct values in the column. If a value is not hashable, it is converted to a string.

pyarrow_dtype

PyArrow data type of the column as a string.

simple_dtype

Simplified human-readable data type label.

column_type

Discriminator field, always “general” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_name: str

1 num_records: int | data_designer.config.analysis.column_statistics.MissingValue

1 num_null: int | data_designer.config.analysis.column_statistics.MissingValue

1 num_unique: int | data_designer.config.analysis.column_statistics.MissingValue

1 pyarrow_dtype: str

1 simple_dtype: str

1 column_type: typing.Literal[general] = general

1 general_statistics_ensure_python_integers(v: int | data_designer.config.analysis.column_statistics.MissingValue) -> int | data_designer.config.analysis.column_statistics.MissingValuegeneral_statistics_ensure_python_integers(v: int | data_designer.config.analysis.column_statistics.MissingValue) -> int | data_designer.config.analysis.column_statistics.MissingValue

1 percent_null: float | data_designer.config.analysis.column_statistics.MissingValue

1 percent_unique: float | data_designer.config.analysis.column_statistics.MissingValue

1 _general_display_row: dict[str, str]

1 create_report_row_data() -> dict[str, str]

1 _is_missing_value(v: float | int | data_designer.config.analysis.column_statistics.MissingValue) -> bool

1 class data_designer.config.analysis.column_statistics.LLMTextColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.GeneralColumnStatistics

Container for statistics on LLM-generated text columns.

Inherits general statistics plus token usage metrics specific to LLM text generation. Stores both prompt and completion token consumption data.

Parameters:

output_tokens_mean

Mean number of output tokens generated per record.

output_tokens_median

Median number of output tokens generated per record.

output_tokens_stddev

Standard deviation of output tokens per record.

input_tokens_mean

Mean number of input tokens used per record.

input_tokens_median

Median number of input tokens used per record.

input_tokens_stddev

Standard deviation of input tokens per record.

column_type

Discriminator field, always “llm-text” for this statistics type.

Attributes:

output_tokens_mean

Mean number of output tokens generated per record.

output_tokens_median

Median number of output tokens generated per record.

output_tokens_stddev

Standard deviation of output tokens per record.

input_tokens_mean

Mean number of input tokens used per record.

input_tokens_median

Median number of input tokens used per record.

input_tokens_stddev

Standard deviation of input tokens per record.

column_type

Discriminator field, always “llm-text” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 output_tokens_mean: float | data_designer.config.analysis.column_statistics.MissingValue

1 output_tokens_median: float | data_designer.config.analysis.column_statistics.MissingValue

1 output_tokens_stddev: float | data_designer.config.analysis.column_statistics.MissingValue

1 input_tokens_mean: float | data_designer.config.analysis.column_statistics.MissingValue

1 input_tokens_median: float | data_designer.config.analysis.column_statistics.MissingValue

1 input_tokens_stddev: float | data_designer.config.analysis.column_statistics.MissingValue

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.LLM_TEXT.value]

1 llm_column_ensure_python_floats(v: float | int | data_designer.config.analysis.column_statistics.MissingValue) -> float | int | data_designer.config.analysis.column_statistics.MissingValuellm_column_ensure_python_floats(v: float | int | data_designer.config.analysis.column_statistics.MissingValue) -> float | int | data_designer.config.analysis.column_statistics.MissingValue

1 create_report_row_data() -> dict[str, typing.Any]

1 class data_designer.config.analysis.column_statistics.LLMCodeColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.LLMTextColumnStatistics

Container for statistics on LLM-generated code columns.

Inherits all token usage metrics from LLMTextColumnStatistics. Stores statistics from columns that generate code snippets in specific programming languages.

Parameters:

column_type

Discriminator field, always “llm-code” for this statistics type.

Attributes:

column_type

Discriminator field, always “llm-code” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.LLM_CODE.value]

1 class data_designer.config.analysis.column_statistics.LLMStructuredColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.LLMTextColumnStatistics

Container for statistics on LLM-generated structured JSON columns.

Inherits all token usage metrics from LLMTextColumnStatistics. Stores statistics from columns that generate structured data conforming to JSON schemas or Pydantic models.

Parameters:

column_type

Discriminator field, always “llm-structured” for this statistics type.

Attributes:

column_type

Discriminator field, always “llm-structured” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.LLM_STRUCTURED.value]

1 class data_designer.config.analysis.column_statistics.LLMJudgedColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.LLMTextColumnStatistics

Container for statistics on LLM-as-a-judge quality assessment columns.

Inherits all token usage metrics from LLMTextColumnStatistics. Stores statistics from columns that evaluate and score other generated content based on defined criteria.

Parameters:

column_type

Discriminator field, always “llm-judge” for this statistics type.

Attributes:

column_type

Discriminator field, always “llm-judge” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.LLM_JUDGE.value]

1 class data_designer.config.analysis.column_statistics.SamplerColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.GeneralColumnStatistics

Container for statistics on sampler-generated columns.

Inherits general statistics plus sampler-specific information including the sampler type used and the empirical distribution of generated values. Stores both categorical and numerical distribution results.

Parameters:

sampler_type

Type of sampler used to generate this column (e.g., “uniform”, “category”, “gaussian”, “person”).

distribution_type

Classification of the column’s distribution (categorical, numerical, text, other, or unknown).

distribution

Empirical distribution statistics for the generated values. Can be CategoricalDistribution (for discrete values), NumericalDistribution (for continuous values), or MissingValue if distribution could not be computed.

column_type

Discriminator field, always “sampler” for this statistics type.

Attributes:

sampler_type

Type of sampler used to generate this column (e.g., “uniform”, “category”, “gaussian”, “person”).

distribution_type

Classification of the column’s distribution (categorical, numerical, text, other, or unknown).

distribution

column_type

Discriminator field, always “sampler” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 sampler_type: data_designer.config.sampler_params.SamplerType

1 distribution_type: data_designer.config.analysis.column_statistics.ColumnDistributionType

1 distribution: CategoricalDistribution | NumericalDistribution | data_designer.config.analysis.column_statistics.MissingValue | None

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.SAMPLER.value]

1 create_report_row_data() -> dict[str, str]

1 class data_designer.config.analysis.column_statistics.SeedDatasetColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.GeneralColumnStatistics

Container for statistics on columns sourced from seed datasets.

Inherits general statistics and stores statistics computed from columns that originate from existing data provided via the seed dataset functionality.

Parameters:

column_type

Discriminator field, always “seed-dataset” for this statistics type.

Attributes:

column_type

Discriminator field, always “seed-dataset” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.SEED_DATASET.value]

1 class data_designer.config.analysis.column_statistics.ExpressionColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.GeneralColumnStatistics

Container for statistics on expression-based derived columns.

Inherits general statistics and stores statistics computed from columns that are derived from columns that are derived from Jinja2 expressions referencing other column values.

Parameters:

column_type

Discriminator field, always “expression” for this statistics type.

Attributes:

column_type

Discriminator field, always “expression” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.EXPRESSION.value]

1 class data_designer.config.analysis.column_statistics.ValidationColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.GeneralColumnStatistics

Container for statistics on validation result columns.

Inherits general statistics plus validation-specific metrics including the count and percentage of records that passed validation. Stores results from validation logic (Python, SQL, local callable, or remote) executed against target columns.

Parameters:

num_valid_records

Number of records that passed validation.

column_type

Discriminator field, always “validation” for this statistics type.

Attributes:

num_valid_records

Number of records that passed validation.

column_type

Discriminator field, always “validation” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 num_valid_records: int | data_designer.config.analysis.column_statistics.MissingValue

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.VALIDATION.value]

1 code_validation_column_ensure_python_integers(v: int | data_designer.config.analysis.column_statistics.MissingValue) -> int | data_designer.config.analysis.column_statistics.MissingValuecode_validation_column_ensure_python_integers(v: int | data_designer.config.analysis.column_statistics.MissingValue) -> int | data_designer.config.analysis.column_statistics.MissingValue

1 percent_valid: float | data_designer.config.analysis.column_statistics.MissingValue

1 create_report_row_data() -> dict[str, str]

1 class data_designer.config.analysis.column_statistics.CategoricalHistogramData(
2     /,
3     **data: typing.Any
4 )

Bases: pydantic.BaseModel

Container for categorical distribution histogram data.

Stores the computed frequency distribution of categorical values.

Parameters:

categories

List of unique category values that appear in the data.

counts

List of occurrence counts for each category.

Attributes:

categories

List of unique category values that appear in the data.

counts

List of occurrence counts for each category.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 categories: list[float | int | str]

1 counts: list[int]

1 ensure_python_types() -> typing_extensions.Self

Ensure numerical values are Python objects rather than Numpy types.

1 from_series(series: pandas.Series) -> typing_extensions.Self

1 class data_designer.config.analysis.column_statistics.CategoricalDistribution(
2     /,
3     **data: typing.Any
4 )

Bases: pydantic.BaseModel

Container for computed categorical distribution statistics.

Parameters:

most_common_value

The category value that appears most frequently in the data.

least_common_value

The category value that appears least frequently in the data.

histogram

Complete frequency distribution showing all categories and their counts.

Attributes:

most_common_value

The category value that appears most frequently in the data.

least_common_value

The category value that appears least frequently in the data.

histogram

Complete frequency distribution showing all categories and their counts.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 most_common_value: str | int

1 least_common_value: str | int

1 histogram: data_designer.config.analysis.column_statistics.CategoricalHistogramData

1 ensure_python_types(v: str | int) -> str | int

1 from_series(series: pandas.Series) -> typing_extensions.Self

1 class data_designer.config.analysis.column_statistics.NumericalDistribution(
2     /,
3     **data: typing.Any
4 )

Bases: pydantic.BaseModel

Container for computed numerical distribution statistics.

Parameters:

min

Minimum value in the distribution.

max

Maximum value in the distribution.

mean

Arithmetic mean (average) of all values.

stddev

Standard deviation measuring the spread of values around the mean.

median

Median value of the distribution.

Attributes:

min

Minimum value in the distribution.

max

Maximum value in the distribution.

mean

Arithmetic mean (average) of all values.

stddev

Standard deviation measuring the spread of values around the mean.

median

Median value of the distribution.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 min: float | int

1 max: float | int

1 mean: float

1 stddev: float

1 median: float

1 ensure_python_types(v: float | int) -> float | int

1 from_series(series: pandas.Series) -> typing_extensions.Self

ColumnStatisticsT

typing_extensions.TypeAlias

1 DEFAULT_COLUMN_STATISTICS_MAP

Module Contents

Classes

Name	Description
`MissingValue`	str(object=”) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
`ColumnDistributionType`	str(object=”) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
`BaseColumnStatistics`	Abstract base class for all column statistics types.
`GeneralColumnStatistics`	Container for general statistics applicable to all column types.
`LLMTextColumnStatistics`	Container for statistics on LLM-generated text columns.
`LLMCodeColumnStatistics`	Container for statistics on LLM-generated code columns.
`LLMStructuredColumnStatistics`	Container for statistics on LLM-generated structured JSON columns.
`LLMJudgedColumnStatistics`	Container for statistics on LLM-as-a-judge quality assessment columns.
`SamplerColumnStatistics`	Container for statistics on sampler-generated columns.
`SeedDatasetColumnStatistics`	Container for statistics on columns sourced from seed datasets.
`ExpressionColumnStatistics`	Container for statistics on expression-based derived columns.
`ValidationColumnStatistics`	Container for statistics on validation result columns.
`CategoricalHistogramData`	Container for categorical distribution histogram data.
`CategoricalDistribution`	Container for computed categorical distribution statistics.
`NumericalDistribution`	Container for computed numerical distribution statistics.

Data

ColumnStatisticsT DEFAULT_COLUMN_STATISTICS_MAP

API

1 class data_designer.config.analysis.column_statistics.MissingValue

Bases: str, enum.Enum

1 CALCULATION_FAILED = --

1 OUTPUT_FORMAT_ERROR = output_format_error

1 class data_designer.config.analysis.column_statistics.ColumnDistributionType

Bases: str, enum.Enum

1 CATEGORICAL = categorical

1 NUMERICAL = numerical

1 TEXT = text

1 OTHER = other

1 UNKNOWN = unknown

1 class data_designer.config.analysis.column_statistics.BaseColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: pydantic.BaseModel, abc.ABC

Abstract base class for all column statistics types.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 model_config = ConfigDict(...)

1 create_report_row_data() -> dict[str, str]

Creates a formatted dictionary of statistics for display in reports.

Returns:

dict[str, str]

Dictionary mapping display labels to formatted statistic values.

1 class data_designer.config.analysis.column_statistics.GeneralColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.BaseColumnStatistics

Container for general statistics applicable to all column types.

Parameters:

column_name

Name of the column being analyzed.

num_records

Total number of records in the column.

num_null

Number of null/missing values in the column.

num_unique

Number of distinct values in the column. If a value is not hashable, it is converted to a string.

pyarrow_dtype

PyArrow data type of the column as a string.

simple_dtype

Simplified human-readable data type label.

column_type

Discriminator field, always “general” for this statistics type.

Attributes:

column_name

Name of the column being analyzed.

num_records

Total number of records in the column.

num_null

Number of null/missing values in the column.

num_unique

Number of distinct values in the column. If a value is not hashable, it is converted to a string.

pyarrow_dtype

PyArrow data type of the column as a string.

simple_dtype

Simplified human-readable data type label.

column_type

Discriminator field, always “general” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_name: str

1 num_records: int | data_designer.config.analysis.column_statistics.MissingValue

1 num_null: int | data_designer.config.analysis.column_statistics.MissingValue

1 num_unique: int | data_designer.config.analysis.column_statistics.MissingValue

1 pyarrow_dtype: str

1 simple_dtype: str

1 column_type: typing.Literal[general] = general

1 general_statistics_ensure_python_integers(v: int | data_designer.config.analysis.column_statistics.MissingValue) -> int | data_designer.config.analysis.column_statistics.MissingValuegeneral_statistics_ensure_python_integers(v: int | data_designer.config.analysis.column_statistics.MissingValue) -> int | data_designer.config.analysis.column_statistics.MissingValue

1 percent_null: float | data_designer.config.analysis.column_statistics.MissingValue

1 percent_unique: float | data_designer.config.analysis.column_statistics.MissingValue

1 _general_display_row: dict[str, str]

1 create_report_row_data() -> dict[str, str]

1 _is_missing_value(v: float | int | data_designer.config.analysis.column_statistics.MissingValue) -> bool

1 class data_designer.config.analysis.column_statistics.LLMTextColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.GeneralColumnStatistics

Container for statistics on LLM-generated text columns.

Inherits general statistics plus token usage metrics specific to LLM text generation. Stores both prompt and completion token consumption data.

Parameters:

output_tokens_mean

Mean number of output tokens generated per record.

output_tokens_median

Median number of output tokens generated per record.

output_tokens_stddev

Standard deviation of output tokens per record.

input_tokens_mean

Mean number of input tokens used per record.

input_tokens_median

Median number of input tokens used per record.

input_tokens_stddev

Standard deviation of input tokens per record.

column_type

Discriminator field, always “llm-text” for this statistics type.

Attributes:

output_tokens_mean

Mean number of output tokens generated per record.

output_tokens_median

Median number of output tokens generated per record.

output_tokens_stddev

Standard deviation of output tokens per record.

input_tokens_mean

Mean number of input tokens used per record.

input_tokens_median

Median number of input tokens used per record.

input_tokens_stddev

Standard deviation of input tokens per record.

column_type

Discriminator field, always “llm-text” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 output_tokens_mean: float | data_designer.config.analysis.column_statistics.MissingValue

1 output_tokens_median: float | data_designer.config.analysis.column_statistics.MissingValue

1 output_tokens_stddev: float | data_designer.config.analysis.column_statistics.MissingValue

1 input_tokens_mean: float | data_designer.config.analysis.column_statistics.MissingValue

1 input_tokens_median: float | data_designer.config.analysis.column_statistics.MissingValue

1 input_tokens_stddev: float | data_designer.config.analysis.column_statistics.MissingValue

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.LLM_TEXT.value]

1 llm_column_ensure_python_floats(v: float | int | data_designer.config.analysis.column_statistics.MissingValue) -> float | int | data_designer.config.analysis.column_statistics.MissingValuellm_column_ensure_python_floats(v: float | int | data_designer.config.analysis.column_statistics.MissingValue) -> float | int | data_designer.config.analysis.column_statistics.MissingValue

1 create_report_row_data() -> dict[str, typing.Any]

1 class data_designer.config.analysis.column_statistics.LLMCodeColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.LLMTextColumnStatistics

Container for statistics on LLM-generated code columns.

Inherits all token usage metrics from LLMTextColumnStatistics. Stores statistics from columns that generate code snippets in specific programming languages.

Parameters:

column_type

Discriminator field, always “llm-code” for this statistics type.

Attributes:

column_type

Discriminator field, always “llm-code” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.LLM_CODE.value]

1 class data_designer.config.analysis.column_statistics.LLMStructuredColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.LLMTextColumnStatistics

Container for statistics on LLM-generated structured JSON columns.

Inherits all token usage metrics from LLMTextColumnStatistics. Stores statistics from columns that generate structured data conforming to JSON schemas or Pydantic models.

Parameters:

column_type

Discriminator field, always “llm-structured” for this statistics type.

Attributes:

column_type

Discriminator field, always “llm-structured” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.LLM_STRUCTURED.value]

1 class data_designer.config.analysis.column_statistics.LLMJudgedColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.LLMTextColumnStatistics

Container for statistics on LLM-as-a-judge quality assessment columns.

Inherits all token usage metrics from LLMTextColumnStatistics. Stores statistics from columns that evaluate and score other generated content based on defined criteria.

Parameters:

column_type

Discriminator field, always “llm-judge” for this statistics type.

Attributes:

column_type

Discriminator field, always “llm-judge” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.LLM_JUDGE.value]

1 class data_designer.config.analysis.column_statistics.SamplerColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.GeneralColumnStatistics

Container for statistics on sampler-generated columns.

Parameters:

sampler_type

Type of sampler used to generate this column (e.g., “uniform”, “category”, “gaussian”, “person”).

distribution_type

Classification of the column’s distribution (categorical, numerical, text, other, or unknown).

distribution

column_type

Discriminator field, always “sampler” for this statistics type.

Attributes:

sampler_type

Type of sampler used to generate this column (e.g., “uniform”, “category”, “gaussian”, “person”).

distribution_type

Classification of the column’s distribution (categorical, numerical, text, other, or unknown).

distribution

column_type

Discriminator field, always “sampler” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 sampler_type: data_designer.config.sampler_params.SamplerType

1 distribution_type: data_designer.config.analysis.column_statistics.ColumnDistributionType

1 distribution: CategoricalDistribution | NumericalDistribution | data_designer.config.analysis.column_statistics.MissingValue | None

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.SAMPLER.value]

1 create_report_row_data() -> dict[str, str]

1 class data_designer.config.analysis.column_statistics.SeedDatasetColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.GeneralColumnStatistics

Container for statistics on columns sourced from seed datasets.

Inherits general statistics and stores statistics computed from columns that originate from existing data provided via the seed dataset functionality.

Parameters:

column_type

Discriminator field, always “seed-dataset” for this statistics type.

Attributes:

column_type

Discriminator field, always “seed-dataset” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.SEED_DATASET.value]

1 class data_designer.config.analysis.column_statistics.ExpressionColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.GeneralColumnStatistics

Container for statistics on expression-based derived columns.

Inherits general statistics and stores statistics computed from columns that are derived from columns that are derived from Jinja2 expressions referencing other column values.

Parameters:

column_type

Discriminator field, always “expression” for this statistics type.

Attributes:

column_type

Discriminator field, always “expression” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.EXPRESSION.value]

1 class data_designer.config.analysis.column_statistics.ValidationColumnStatistics(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.analysis.column_statistics.GeneralColumnStatistics

Container for statistics on validation result columns.

Parameters:

num_valid_records

Number of records that passed validation.

column_type

Discriminator field, always “validation” for this statistics type.

Attributes:

num_valid_records

Number of records that passed validation.

column_type

Discriminator field, always “validation” for this statistics type.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 num_valid_records: int | data_designer.config.analysis.column_statistics.MissingValue

1 column_type: typing.Literal[data_designer.config.column_types.DataDesignerColumnType.VALIDATION.value]

1 code_validation_column_ensure_python_integers(v: int | data_designer.config.analysis.column_statistics.MissingValue) -> int | data_designer.config.analysis.column_statistics.MissingValuecode_validation_column_ensure_python_integers(v: int | data_designer.config.analysis.column_statistics.MissingValue) -> int | data_designer.config.analysis.column_statistics.MissingValue

1 percent_valid: float | data_designer.config.analysis.column_statistics.MissingValue

1 create_report_row_data() -> dict[str, str]

1 class data_designer.config.analysis.column_statistics.CategoricalHistogramData(
2     /,
3     **data: typing.Any
4 )

Bases: pydantic.BaseModel

Container for categorical distribution histogram data.

Stores the computed frequency distribution of categorical values.

Parameters:

categories

List of unique category values that appear in the data.

counts

List of occurrence counts for each category.

Attributes:

categories

List of unique category values that appear in the data.

counts

List of occurrence counts for each category.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 categories: list[float | int | str]

1 counts: list[int]

1 ensure_python_types() -> typing_extensions.Self

Ensure numerical values are Python objects rather than Numpy types.

1 from_series(series: pandas.Series) -> typing_extensions.Self

1 class data_designer.config.analysis.column_statistics.CategoricalDistribution(
2     /,
3     **data: typing.Any
4 )

Bases: pydantic.BaseModel

Container for computed categorical distribution statistics.

Parameters:

most_common_value

The category value that appears most frequently in the data.

least_common_value

The category value that appears least frequently in the data.

histogram

Complete frequency distribution showing all categories and their counts.

Attributes:

most_common_value

The category value that appears most frequently in the data.

least_common_value

The category value that appears least frequently in the data.

histogram

Complete frequency distribution showing all categories and their counts.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 most_common_value: str | int

1 least_common_value: str | int

1 histogram: data_designer.config.analysis.column_statistics.CategoricalHistogramData

1 ensure_python_types(v: str | int) -> str | int

1 from_series(series: pandas.Series) -> typing_extensions.Self

1 class data_designer.config.analysis.column_statistics.NumericalDistribution(
2     /,
3     **data: typing.Any
4 )

Bases: pydantic.BaseModel

Container for computed numerical distribution statistics.

Parameters:

min

Minimum value in the distribution.

max

Maximum value in the distribution.

mean

Arithmetic mean (average) of all values.

stddev

Standard deviation measuring the spread of values around the mean.

median

Median value of the distribution.

Attributes:

min

Minimum value in the distribution.

max

Maximum value in the distribution.

mean

Arithmetic mean (average) of all values.

stddev

Standard deviation measuring the spread of values around the mean.

median

Median value of the distribution.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 min: float | int

1 max: float | int

1 mean: float

1 stddev: float

1 median: float

1 ensure_python_types(v: float | int) -> float | int

1 from_series(series: pandas.Series) -> typing_extensions.Self

ColumnStatisticsT

typing_extensions.TypeAlias

1 DEFAULT_COLUMN_STATISTICS_MAP

1	class data_designer.config.analysis.column_statistics.BaseColumnStatistics(
2	/,
3	**data: typing.Any
4	)