data_designer.config.analysis.dataset_profiler

Module Contents

Classes

Name	Description
`DatasetProfilerResults`	Container for complete dataset profiling and analysis results.

API

1 class data_designer.config.analysis.dataset_profiler.DatasetProfilerResults(
2     /,
3     **data: typing.Any
4 )

Bases: pydantic.BaseModel

Container for complete dataset profiling and analysis results.

Stores profiling results for a generated dataset, including statistics for configured columns, dataset-level metadata, side-effect column names, and optional advanced profiler results. Provides methods for computing derived metrics and generating formatted reports.

Parameters:

num_records

Actual number of records successfully generated in the dataset.

target_num_records

Target number of records that were requested to be generated.

column_statistics

List of statistics objects for configured columns. Each column has statistics appropriate to its type. Must contain at least one column.

side_effect_column_names

Column names that were generated as side effects of other columns.

column_profiles

Column profiler results for specific columns when configured.

Attributes:

num_records

Actual number of records successfully generated in the dataset.

target_num_records

Target number of records that were requested to be generated.

column_statistics

List of statistics objects for configured columns. Each column has statistics appropriate to its type. Must contain at least one column.

side_effect_column_names

Column names that were generated as side effects of other columns.

column_profiles

Column profiler results for specific columns when configured.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 num_records: int

1 target_num_records: int

1 column_statistics: list[typing.Annotated[data_designer.config.analysis.column_statistics.ColumnStatisticsT, Field(discriminator='column_type')]] = Field(...)

1 side_effect_column_names: list[str] | None

1 column_profiles: list[data_designer.config.analysis.column_profilers.ColumnProfilerResultsT] | None

1 ensure_python_integers(v: int) -> int

1 percent_complete: float

Returns the completion percentage of the dataset.

1 column_types() -> list[str]

Returns a sorted list of unique column types present in the dataset.

1 get_column_statistics_by_type(column_type: data_designer.config.column_types.DataDesignerColumnType) -> list[data_designer.config.analysis.column_statistics.ColumnStatisticsT]get_column_statistics_by_type(column_type: data_designer.config.column_types.DataDesignerColumnType) -> list[data_designer.config.analysis.column_statistics.ColumnStatisticsT]

Filters column statistics to return only those of the specified type.

1 to_report(
2     save_path: str | pathlib.Path | None = None,
3     include_sections: list[data_designer.config.analysis.utils.reporting.ReportSection | data_designer.config.column_types.DataDesignerColumnType] | None = None    include_sections: list[data_designer.config.analysis.utils.reporting.ReportSection | data_designer.config.column_types.DataDesignerColumnType] | None = None
4 ) -> None

Generate and print an analysis report based on the dataset profiling results.

Parameters:

save_path

str | pathlib.Path | NoneDefaults to None

Optional path to save the report. If provided, the report will be saved as either HTML (.html) or SVG (.svg) format. If None, the report will only be displayed in the console.

include_sections

list[data_designer.config.analysis.utils.reporting.ReportSection | data_designer.config.column_types.DataDesignerColumnType] | NoneDefaults to None

Optional list of sections to include in the report. Choices are any DataDesignerColumnType, “overview” (the dataset overview section), and “column_profilers” (all column profilers in one section). If None, all sections will be included.

1	class data_designer.config.analysis.dataset_profiler.DatasetProfilerResults(
2	/,
3	**data: typing.Any
4	)

1	to_report(
2	save_path: str \| pathlib.Path \| None = None,
3	include_sections: list[data_designer.config.analysis.utils.reporting.ReportSection \| data_designer.config.column_types.DataDesignerColumnType] \| None = None include_sections: list[data_designer.config.analysis.utils.reporting.ReportSection \| data_designer.config.column_types.DataDesignerColumnType] \| None = None
4	) -> None