For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • models
      • mcp
      • column_configs
      • config_builder
      • data_designer_config
      • run_config
      • sampler_params
      • validator_params
      • seeds
      • processors
      • analysis
      • Config API
        • Analysis
          • Column Profilers
          • Column Statistics
          • Dataset Profiler
          • Utils
        • Base
        • Column Configs
        • Column Types
        • Config Builder
        • Custom Column
        • Data Designer Config
        • Dataset Metadata
        • Default Model Settings
        • Errors
        • Exportable Config
        • Fingerprint
        • Interface
        • Mcp
        • Models
        • Preview Results
        • Processor Types
        • Processors
        • Run Config
        • Sampler Constraints
        • Sampler Params
        • Seed
        • Seed Source
        • Seed Source Dataframe
        • Seed Source Types
        • Testing
        • Utils
        • Validator Params
        • Version
  • Dev Notes
    • Overview
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
    • Data Designer Got Skills
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Classes
  • Data
  • API
Code ReferenceConfigConfig APIAnalysis

data_designer.config.analysis.column_profilers

||View as Markdown|
Previous

Analysis

Next

Column Statistics

Module Contents

Classes

NameDescription
ColumnProfilerTypestr(object=”) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
ColumnProfilerResultsAbstract base class for column profiler results.
JudgeScoreProfilerConfigConfiguration for the LLM-as-a-judge score profiler.
JudgeScoreSampleContainer for a single judge score and its associated reasoning.
JudgeScoreDistributionsContainer for computed distributions across all judge score dimensions.
JudgeScoreSummaryContainer for an LLM-generated summary of a judge score dimension.
JudgeScoreProfilerResultsContainer for complete judge score profiler analysis results.

Data

ColumnProfilerConfigT ColumnProfilerResultsT

API

1class data_designer.config.analysis.column_profilers.ColumnProfilerType

Bases: str, enum.Enum

1JUDGE_SCORE = judge-score
1class data_designer.config.analysis.column_profilers.ColumnProfilerResults(
2 /,
3 **data: typing.Any
4)

Bases: pydantic.BaseModel, abc.ABC

Abstract base class for column profiler results.

Stores results from column profiling operations. Subclasses hold profiler-specific analysis results and provide methods for generating formatted report sections for display.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1create_report_section() -> rich.panel.Panel

Creates a Rich Panel containing the formatted profiler results for display.

Returns:

rich.panel.Panel

A Rich Panel containing the formatted profiler results. Default implementation returns a “Not Implemented” message; subclasses should override to provide specific formatting.

1class data_designer.config.analysis.column_profilers.JudgeScoreProfilerConfig(
2 /,
3 **data: typing.Any
4)

Bases: data_designer.config.base.ConfigBase

Configuration for the LLM-as-a-judge score profiler.

Parameters:

model_alias

Alias of the LLM model to use for generating score distribution summaries. Must match a model alias defined in the Data Designer configuration.

summary_score_sample_size

Number of score samples to include when prompting the LLM to generate summaries. Larger sample sizes provide more context but increase token usage. Must be at least 1 when provided. Set to None to skip LLM-generated summaries. Defaults to 20.

Attributes:

model_alias

Alias of the LLM model to use for generating score distribution summaries. Must match a model alias defined in the Data Designer configuration.

summary_score_sample_size

Number of score samples to include when prompting the LLM to generate summaries. Larger sample sizes provide more context but increase token usage. Must be at least 1 when provided. Set to None to skip LLM-generated summaries. Defaults to 20.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1model_alias: str
1summary_score_sample_size: int | None = Field(...)
1class data_designer.config.analysis.column_profilers.JudgeScoreSample(
2 /,
3 **data: typing.Any
4)

Bases: pydantic.BaseModel

Container for a single judge score and its associated reasoning.

Stores a paired score-reasoning sample extracted from an LLM-as-a-judge column. Used when generating summaries to provide the LLM with examples of scoring patterns.

Parameters:

score

The score value assigned by the judge. Can be numeric (int) or categorical (str).

reasoning

The reasoning or explanation provided by the judge for this score.

Attributes:

score

The score value assigned by the judge. Can be numeric (int) or categorical (str).

reasoning

The reasoning or explanation provided by the judge for this score.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1score: int | str
1reasoning: str
1class data_designer.config.analysis.column_profilers.JudgeScoreDistributions(
2 /,
3 **data: typing.Any
4)

Bases: pydantic.BaseModel

Container for computed distributions across all judge score dimensions.

Stores the complete distribution analysis for all score dimensions in an LLM-as-a-judge column. Each score dimension (e.g., “relevance”, “fluency”) has its own distribution computed from the generated data.

Parameters:

scores

Mapping of each score dimension name to its list of score values.

reasoning

Mapping of each score dimension name to its list of reasoning texts.

distribution_types

Mapping of each score dimension name to its classification.

distributions

Mapping of each score dimension name to its computed distribution statistics.

histograms

Mapping of each score dimension name to its histogram data.

Attributes:

scores

Mapping of each score dimension name to its list of score values.

reasoning

Mapping of each score dimension name to its list of reasoning texts.

distribution_types

Mapping of each score dimension name to its classification.

distributions

Mapping of each score dimension name to its computed distribution statistics.

histograms

Mapping of each score dimension name to its histogram data.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1scores: dict[str, list[int | str]]
1reasoning: dict[str, list[str]]
1distribution_types: dict[str, data_designer.config.analysis.column_statistics.ColumnDistributionType]
1distributions: dict[str, data_designer.config.analysis.column_statistics.CategoricalDistribution | data_designer.config.analysis.column_statistics.NumericalDistribution | data_designer.config.analysis.column_statistics.MissingValue]distributions: dict[str, data_designer.config.analysis.column_statistics.CategoricalDistribution | data_designer.config.analysis.column_statistics.NumericalDistribution | data_designer.config.analysis.column_statistics.MissingValue]distributions: dict[str, data_designer.config.analysis.column_statistics.CategoricalDistribution | data_designer.config.analysis.column_statistics.NumericalDistribution | data_designer.config.analysis.column_statistics.MissingValue]
1histograms: dict[str, data_designer.config.analysis.column_statistics.CategoricalHistogramData | data_designer.config.analysis.column_statistics.MissingValue]histograms: dict[str, data_designer.config.analysis.column_statistics.CategoricalHistogramData | data_designer.config.analysis.column_statistics.MissingValue]
1class data_designer.config.analysis.column_profilers.JudgeScoreSummary(
2 /,
3 **data: typing.Any
4)

Bases: pydantic.BaseModel

Container for an LLM-generated summary of a judge score dimension.

Stores the natural language summary and sample data for a single score dimension generated by the judge score profiler. The summary is created by an LLM analyzing the distribution and patterns in the score-reasoning pairs.

Parameters:

score_name

Name of the score dimension being summarized (e.g., “relevance”, “fluency”).

summary

LLM-generated natural language summary describing the scoring patterns, distribution characteristics, and notable trends for this score dimension.

score_samples

List of score-reasoning pairs that were used to generate the summary. These are the examples of the scoring behavior that were used to generate the summary.

Attributes:

score_name

Name of the score dimension being summarized (e.g., “relevance”, “fluency”).

summary

LLM-generated natural language summary describing the scoring patterns, distribution characteristics, and notable trends for this score dimension.

score_samples

List of score-reasoning pairs that were used to generate the summary. These are the examples of the scoring behavior that were used to generate the summary.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1score_name: str
1summary: str
1score_samples: list[data_designer.config.analysis.column_profilers.JudgeScoreSample]
1class data_designer.config.analysis.column_profilers.JudgeScoreProfilerResults(
2 /,
3 **data: typing.Any
4)

Bases: data_designer.config.analysis.column_profilers.ColumnProfilerResults

Container for complete judge score profiler analysis results.

Parameters:

column_name

Name of the judge column that was profiled.

summaries

Mapping of each score dimension name to its LLM-generated summary.

score_distributions

Complete distribution analysis across all score dimensions.

Attributes:

column_name

Name of the judge column that was profiled.

summaries

Mapping of each score dimension name to its LLM-generated summary.

score_distributions

Complete distribution analysis across all score dimensions.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1column_name: str
1summaries: dict[str, data_designer.config.analysis.column_profilers.JudgeScoreSummary]
1score_distributions: data_designer.config.analysis.column_profilers.JudgeScoreDistributions | data_designer.config.analysis.column_statistics.MissingValuescore_distributions: data_designer.config.analysis.column_profilers.JudgeScoreDistributions | data_designer.config.analysis.column_statistics.MissingValue
1create_report_section() -> rich.panel.Panel
ColumnProfilerConfigT
typing_extensions.TypeAlias
ColumnProfilerResultsT
typing_extensions.TypeAlias