For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • models
      • mcp
      • column_configs
      • config_builder
      • data_designer_config
      • run_config
      • sampler_params
      • validator_params
      • seeds
      • processors
      • analysis
      • Config API
        • Analysis
          • Column Profilers
          • Column Statistics
          • Dataset Profiler
          • Utils
        • Base
        • Column Configs
        • Column Types
        • Config Builder
        • Custom Column
        • Data Designer Config
        • Dataset Metadata
        • Default Model Settings
        • Errors
        • Exportable Config
        • Fingerprint
        • Interface
        • Mcp
        • Models
        • Preview Results
        • Processor Types
        • Processors
        • Run Config
        • Sampler Constraints
        • Sampler Params
        • Seed
        • Seed Source
        • Seed Source Dataframe
        • Seed Source Types
        • Testing
        • Utils
        • Validator Params
        • Version
  • Dev Notes
    • Overview
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Classes
  • API
Code ReferenceConfigConfig APIAnalysis

data_designer.config.analysis.dataset_profiler

||View as Markdown|
Previous

Column Statistics

Next

Utils

Module Contents

Classes

NameDescription
DatasetProfilerResultsContainer for complete dataset profiling and analysis results.

API

1class data_designer.config.analysis.dataset_profiler.DatasetProfilerResults(
2 /,
3 **data: typing.Any
4)

Bases: pydantic.BaseModel

Container for complete dataset profiling and analysis results.

Stores profiling results for a generated dataset, including statistics for configured columns, dataset-level metadata, side-effect column names, and optional advanced profiler results. Provides methods for computing derived metrics and generating formatted reports.

Parameters:

num_records

Actual number of records successfully generated in the dataset.

target_num_records

Target number of records that were requested to be generated.

column_statistics

List of statistics objects for configured columns. Each column has statistics appropriate to its type. Must contain at least one column.

side_effect_column_names

Column names that were generated as side effects of other columns.

column_profiles

Column profiler results for specific columns when configured.

Attributes:

num_records

Actual number of records successfully generated in the dataset.

target_num_records

Target number of records that were requested to be generated.

column_statistics

List of statistics objects for configured columns. Each column has statistics appropriate to its type. Must contain at least one column.

side_effect_column_names

Column names that were generated as side effects of other columns.

column_profiles

Column profiler results for specific columns when configured.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1num_records: int
1target_num_records: int
1column_statistics: list[typing.Annotated[data_designer.config.analysis.column_statistics.ColumnStatisticsT, Field(discriminator='column_type')]] = Field(...)
1side_effect_column_names: list[str] | None
1column_profiles: list[data_designer.config.analysis.column_profilers.ColumnProfilerResultsT] | None
1ensure_python_integers(v: int) -> int
1percent_complete: float

Returns the completion percentage of the dataset.

1column_types() -> list[str]

Returns a sorted list of unique column types present in the dataset.

1get_column_statistics_by_type(column_type: data_designer.config.column_types.DataDesignerColumnType) -> list[data_designer.config.analysis.column_statistics.ColumnStatisticsT]get_column_statistics_by_type(column_type: data_designer.config.column_types.DataDesignerColumnType) -> list[data_designer.config.analysis.column_statistics.ColumnStatisticsT]

Filters column statistics to return only those of the specified type.

1to_report(
2 save_path: str | pathlib.Path | None = None,
3 include_sections: list[data_designer.config.analysis.utils.reporting.ReportSection | data_designer.config.column_types.DataDesignerColumnType] | None = None include_sections: list[data_designer.config.analysis.utils.reporting.ReportSection | data_designer.config.column_types.DataDesignerColumnType] | None = None
4) -> None

Generate and print an analysis report based on the dataset profiling results.

Parameters:

save_path
str | pathlib.Path | NoneDefaults to None

Optional path to save the report. If provided, the report will be saved as either HTML (.html) or SVG (.svg) format. If None, the report will only be displayed in the console.

include_sections
list[data_designer.config.analysis.utils.reporting.ReportSection | data_designer.config.column_types.DataDesignerColumnType] | NoneDefaults to None

Optional list of sections to include in the report. Choices are any DataDesignerColumnType, “overview” (the dataset overview section), and “column_profilers” (all column profilers in one section). If None, all sections will be included.