data_designer.config.sampler_params

Module Contents

Classes

Name	Description
`SamplerType`	str(object=”) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
`CategorySamplerParams`	Parameters for categorical sampling with optional probability weighting.
`DatetimeSamplerParams`	Parameters for uniform datetime sampling within a specified range.
`SubcategorySamplerParams`	Parameters for subcategory sampling conditioned on a parent category column.
`TimeDeltaSamplerParams`	Parameters for sampling time deltas relative to a reference datetime column.
`UUIDSamplerParams`	Parameters for generating UUID (Universally Unique Identifier) values.
`ScipySamplerParams`	Parameters for sampling from any scipy.stats continuous or discrete distribution.
`BinomialSamplerParams`	Parameters for sampling from a Binomial distribution.
`BernoulliSamplerParams`	Parameters for sampling from a Bernoulli distribution.
`BernoulliMixtureSamplerParams`	Parameters for sampling from a Bernoulli mixture distribution.
`GaussianSamplerParams`	Parameters for sampling from a Gaussian (Normal) distribution.
`PoissonSamplerParams`	Parameters for sampling from a Poisson distribution.
`UniformSamplerParams`	Parameters for sampling from a continuous Uniform distribution.
`PersonSamplerParams`	Parameters for sampling synthetic person data with demographic attributes.
`PersonFromFakerSamplerParams`	Parameters for sampling synthetic person data with demographic attributes from Faker.

Functions

Name	Description
`is_numerical_sampler_type`	None

Data

SexT SamplerParamsT

API

1 class data_designer.config.sampler_params.SamplerType

Bases: str, enum.Enum

1 BERNOULLI = bernoulli

1 BERNOULLI_MIXTURE = bernoulli_mixture

1 BINOMIAL = binomial

1 CATEGORY = category

1 DATETIME = datetime

1 GAUSSIAN = gaussian

1 PERSON = person

1 PERSON_FROM_FAKER = person_from_faker

1 POISSON = poisson

1 SCIPY = scipy

1 SUBCATEGORY = subcategory

1 TIMEDELTA = timedelta

1 UNIFORM = uniform

1 UUID = uuid

1 class data_designer.config.sampler_params.CategorySamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for categorical sampling with optional probability weighting.

Samples values from a discrete set of categories. When weights are provided, values are sampled according to their assigned probabilities. Without weights, uniform sampling is used.

Parameters:

values

List of possible categorical values to sample from. Can contain strings, integers, or floats. Must contain at least one value.

weights

Optional unnormalized probability weights for each value. If provided, must be the same length as values. Weights are automatically normalized to sum to 1.0. Larger weights result in higher sampling probability for the corresponding value.

Attributes:

values

`required`

List of possible categorical values to sample from. Can contain strings, integers, or floats. Must contain at least one value.

weights

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 values: list[str | int | float] = Field(...)

1 weights: list[float] | None = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 _normalize_weights_if_needed() -> typing_extensions.Self

1 _validate_equal_lengths() -> typing_extensions.Self

1 class data_designer.config.sampler_params.DatetimeSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for uniform datetime sampling within a specified range.

Samples datetime values uniformly between a start and end date with a specified granularity. The sampling unit determines the smallest possible time interval between consecutive samples.

Parameters:

start

Earliest possible datetime for the sampling range (inclusive). Must be a valid datetime string parseable by pandas.to_datetime().

end

Exclusive upper bound for the sampling range. Must be a valid datetime string parseable by pandas.to_datetime().

unit

Time unit for sampling granularity. Options:

“Y”: Years
“M”: Months
“D”: Days (default)
“h”: Hours
“m”: Minutes
“s”: Seconds

Attributes:

start

`required`

Earliest possible datetime for the sampling range (inclusive). Must be a valid datetime string parseable by pandas.to_datetime().

end

`required`

Exclusive upper bound for the sampling range. Must be a valid datetime string parseable by pandas.to_datetime().

unit

Time unit for sampling granularity. Options:

“Y”: Years
“M”: Months
“D”: Days (default)
“h”: Hours
“m”: Minutes
“s”: Seconds

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 start: str = Field(...)

1 end: str = Field(...)

1 unit: typing.Literal[Y, M, D, h, m, s] = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 _validate_param_is_datetime(value: str) -> str

1 class data_designer.config.sampler_params.SubcategorySamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for subcategory sampling conditioned on a parent category column.

Samples subcategory values based on the value of a parent category column. Each parent category value maps to its own list of possible subcategory values, enabling hierarchical or conditional sampling patterns.

Parameters:

category

Name of the parent category column that this subcategory depends on. The parent column must be generated before this subcategory column.

values

Mapping from each parent category value to a list of possible subcategory values. Each key must correspond to a value that appears in the parent category column.

Attributes:

category

`required`

Name of the parent category column that this subcategory depends on. The parent column must be generated before this subcategory column.

values

`required`

Mapping from each parent category value to a list of possible subcategory values. Each key must correspond to a value that appears in the parent category column.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 category: str = Field(...)

1 values: dict[str, list[str | int | float]] = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.TimeDeltaSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling time deltas relative to a reference datetime column.

Samples time offsets within a specified range and adds them to values from a reference datetime column. This is useful for generating related datetime columns like order dates and delivery dates, or event start times and end times.

Years and months are not supported as timedelta units because they have variable lengths. See: pandas timedelta documentation

Parameters:

dt_min

Minimum time-delta value (inclusive). Must be non-negative and less than dt_max. Specified in units defined by the unit parameter.

dt_max

Maximum time-delta value (exclusive). Must be positive and greater than dt_min. Specified in units defined by the unit parameter.

reference_column_name

Name of an existing datetime column to add the time-delta to. This column must be generated before the timedelta column.

unit

Time unit for the delta values. Options:

“D”: Days (default)
“h”: Hours
“m”: Minutes
“s”: Seconds

Attributes:

dt_min

`required`

Minimum time-delta value (inclusive). Must be non-negative and less than dt_max. Specified in units defined by the unit parameter.

dt_max

`required`

Maximum time-delta value (exclusive). Must be positive and greater than dt_min. Specified in units defined by the unit parameter.

reference_column_name

`required`

Name of an existing datetime column to add the time-delta to. This column must be generated before the timedelta column.

unit

Time unit for the delta values. Options:

“D”: Days (default)
“h”: Hours
“m”: Minutes
“s”: Seconds

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 dt_min: int = Field(...)

1 dt_max: int = Field(...)

1 reference_column_name: str = Field(...)

1 unit: typing.Literal[D, h, m, s] = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 _validate_min_less_than_max() -> typing_extensions.Self

1 class data_designer.config.sampler_params.UUIDSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for generating UUID (Universally Unique Identifier) values.

Generates UUID4 (random) identifiers with optional formatting options. UUIDs are useful for creating unique identifiers for records, entities, or transactions.

Parameters:

prefix

Optional string to prepend to each UUID. Useful for creating namespaced or typed identifiers (e.g., “user-”, “order-”, “txn-”).

short_form

If True, truncates UUIDs to 8 characters (first segment only). Default is False for full 32-character UUIDs (excluding hyphens).

uppercase

If True, converts all hexadecimal letters to uppercase. Default is False for lowercase UUIDs.

Attributes:

prefix

Optional string to prepend to each UUID. Useful for creating namespaced or typed identifiers (e.g., “user-”, “order-”, “txn-”).

short_form

If True, truncates UUIDs to 8 characters (first segment only). Default is False for full 32-character UUIDs (excluding hyphens).

uppercase

If True, converts all hexadecimal letters to uppercase. Default is False for lowercase UUIDs.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 prefix: str | None = Field(...)

1 short_form: bool = Field(...)

1 uppercase: bool = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 last_index: int

1 class data_designer.config.sampler_params.ScipySamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from any scipy.stats continuous or discrete distribution.

Provides a flexible interface to sample from the wide range of probability distributions available in scipy.stats. This enables advanced statistical sampling beyond the built-in distribution types (Gaussian, Uniform, etc.).

See: scipy.stats documentation

Parameters:

dist_name

Name of the scipy.stats distribution to sample from (e.g., “beta”, “gamma”, “lognorm”, “expon”). Must be a valid distribution name from scipy.stats.

dist_params

Dictionary of parameters for the specified distribution. Parameter names and values must match the scipy.stats distribution specification (e.g., {“a”: 2, “b”: 5} for beta distribution, {“scale”: 1.5} for exponential).

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded.

Attributes:

dist_name

`required`

Name of the scipy.stats distribution to sample from (e.g., “beta”, “gamma”, “lognorm”, “expon”). Must be a valid distribution name from scipy.stats.

dist_params

`required`

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 dist_name: str = Field(...)

1 dist_params: dict = Field(...)

1 decimal_places: int | None = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.BinomialSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a Binomial distribution.

Samples integer values representing the number of successes in a fixed number of independent Bernoulli trials, each with the same probability of success. Commonly used to model the number of successful outcomes in repeated experiments.

Parameters:

Number of independent trials. Must be a positive integer.

Probability of success on each trial. Must be between 0.0 and 1.0 (inclusive).

Attributes:

`required`

Number of independent trials. Must be a positive integer.

`required`

Probability of success on each trial. Must be between 0.0 and 1.0 (inclusive).

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 n: int = Field(...)

1 p: float = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.BernoulliSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a Bernoulli distribution.

Samples binary values (0 or 1) representing the outcome of a single trial with a fixed probability of success. This is the simplest discrete probability distribution, useful for modeling binary outcomes like success/failure, yes/no, or true/false.

Parameters:

Probability of success (sampling 1). Must be between 0.0 and 1.0 (inclusive). The probability of failure (sampling 0) is automatically 1 - p.

Attributes:

`required`

Probability of success (sampling 1). Must be between 0.0 and 1.0 (inclusive). The probability of failure (sampling 0) is automatically 1 - p.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 p: float = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.BernoulliMixtureSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a Bernoulli mixture distribution.

Combines a Bernoulli distribution with another continuous distribution, creating a mixture where values are either 0 (with probability 1-p) or sampled from the specified distribution (with probability p). This is useful for modeling scenarios with many zero values mixed with a continuous distribution of non-zero values.

Common use cases include modeling sparse events, zero-inflated data, or situations where an outcome either doesn’t occur (0) or follows a specific distribution when it does occur.

Parameters:

Probability of sampling from the mixture distribution (non-zero outcome). Must be between 0.0 and 1.0 (inclusive). With probability 1-p, the sample is 0.

dist_name

Name of the scipy.stats distribution to sample from when outcome is non-zero. Must be a valid scipy.stats distribution name (e.g., “norm”, “gamma”, “expon”).

dist_params

Parameters for the specified scipy.stats distribution.

Attributes:

`required`

Probability of sampling from the mixture distribution (non-zero outcome). Must be between 0.0 and 1.0 (inclusive). With probability 1-p, the sample is 0.

dist_name

`required`

Name of the scipy.stats distribution to sample from when outcome is non-zero. Must be a valid scipy.stats distribution name (e.g., “norm”, “gamma”, “expon”).

dist_params

`required`

Parameters for the specified scipy.stats distribution.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 p: float = Field(...)

1 dist_name: str = Field(...)

1 dist_params: dict = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.GaussianSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a Gaussian (Normal) distribution.

Samples continuous values from a normal distribution characterized by its mean and standard deviation. The Gaussian distribution is one of the most commonly used probability distributions, appearing naturally in many real-world phenomena due to the Central Limit Theorem.

Parameters:

mean

Mean (center) of the Gaussian distribution. This is the expected value and the location of the distribution’s peak.

stddev

Standard deviation of the Gaussian distribution. Controls the spread or width of the distribution. Must be positive.

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded.

Attributes:

mean

`required`

Mean (center) of the Gaussian distribution. This is the expected value and the location of the distribution’s peak.

stddev

`required`

Standard deviation of the Gaussian distribution. Controls the spread or width of the distribution. Must be positive.

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 mean: float = Field(...)

1 stddev: float = Field(...)

1 decimal_places: int | None = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.PoissonSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a Poisson distribution.

Samples non-negative integer values representing the number of events occurring in a fixed interval of time or space. The Poisson distribution is commonly used to model count data like the number of arrivals, occurrences, or events per time period.

The distribution is characterized by a single parameter (mean/rate), and both the mean and variance equal this parameter value.

Parameters:

mean

Mean number of events in the fixed interval (also called rate parameter λ). Must be positive. This represents both the expected value and the variance of the distribution.

Attributes:

mean

`required`

Mean number of events in the fixed interval (also called rate parameter λ). Must be positive. This represents both the expected value and the variance of the distribution.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 mean: float = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.UniformSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a continuous Uniform distribution.

Samples continuous values uniformly from a specified range, where every value in the range has equal probability of being sampled. This is useful when all values within a range are equally likely, such as random percentages, proportions, or unbiased measurements.

Parameters:

low

Lower bound of the uniform distribution (inclusive). Can be any real number.

high

Upper bound of the uniform distribution. Must be greater than low.

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded and may have many decimal places.

Attributes:

low

`required`

Lower bound of the uniform distribution (inclusive). Can be any real number.

high

`required`

Upper bound of the uniform distribution. Must be greater than low.

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded and may have many decimal places.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 low: float = Field(...)

1 high: float = Field(...)

1 decimal_places: int | None = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

SexT

typing_extensions.TypeAlias

1 class data_designer.config.sampler_params.PersonSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling synthetic person data with demographic attributes.

Generates realistic synthetic person data including names, addresses, phone numbers, and other demographic information from managed datasets. The sampler supports filtering by locale, sex, age, geographic location, and selected managed-dataset fields, and can optionally include synthetic persona descriptions. For Faker-generated person data, use PersonFromFakerSamplerParams.

Parameters:

locale

Locale string determining the language and geographic region for synthetic people. Must be a locale supported by a managed Nemotron Personas dataset. The dataset must be downloaded and available in the managed assets directory.

sex

If specified, filters to only sample people of the specified sex. Options: “Male” or “Female”. If None, samples both sexes.

city

If specified, filters to only sample people from the specified city or cities. Can be a single city name (string) or a list of city names.

age_range

Two-element list [min_age, max_age] specifying the age range to sample from (inclusive). Defaults to a standard age range. Both values must be between minimum and maximum allowed ages.

with_synthetic_personas

If True, appends additional synthetic persona columns including personality traits, interests, and background descriptions. Only supported for certain locales with managed datasets.

select_field_values

Optional field-value filters for managed datasets. Supported field names are checked against the managed person data fields.

Attributes:

locale

sex

If specified, filters to only sample people of the specified sex. Options: “Male” or “Female”. If None, samples both sexes.

city

If specified, filters to only sample people from the specified city or cities. Can be a single city name (string) or a list of city names.

age_range

Two-element list [min_age, max_age] specifying the age range to sample from (inclusive). Defaults to a standard age range. Both values must be between minimum and maximum allowed ages.

with_synthetic_personas

If True, appends additional synthetic persona columns including personality traits, interests, and background descriptions. Only supported for certain locales with managed datasets.

select_field_values

Optional field-value filters for managed datasets. Supported field names are checked against the managed person data fields.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 locale: str = Field(...)

1 sex: data_designer.config.sampler_params.SexT | None = Field(...)

1 city: str | list[str] | None = Field(...)

1 age_range: list[int] = Field(...)

1 select_field_values: dict[str, list[str]] | None = Field(...)

1 with_synthetic_personas: bool = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 generator_kwargs: list[str]

Keyword arguments to pass to the person generator.

1 people_gen_key: str

1 _validate_age_range(value: list[int]) -> list[int]

1 _validate_locale_with_managed_datasets() -> typing_extensions.Self

1 class data_designer.config.sampler_params.PersonFromFakerSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling synthetic person data with demographic attributes from Faker.

Uses the Faker library to generate random personal information. The data is basic and not demographically accurate, but is useful for quick testing, prototyping, or when realistic demographic distributions are not relevant for your use case. For demographically accurate person data, use the PersonSamplerParams sampler.

Parameters:

locale

Locale string determining the language and geographic region for synthetic people. Can be any locale supported by Faker.

sex

If specified, filters to only sample people of the specified sex. Options: “Male” or “Female”. If None, samples both sexes.

city

If specified, filters to only sample people from the specified city or cities. Can be a single city name (string) or a list of city names.

age_range

Two-element list [min_age, max_age] specifying the age range to sample from (inclusive). Defaults to a standard age range. Both values must be between the minimum and maximum allowed ages.

sampler_type

Discriminator for the sampler type. Must be SamplerType.PERSON_FROM_FAKER.

Attributes:

locale

Locale string determining the language and geographic region for synthetic people. Can be any locale supported by Faker.

sex

If specified, filters to only sample people of the specified sex. Options: “Male” or “Female”. If None, samples both sexes.

city

If specified, filters to only sample people from the specified city or cities. Can be a single city name (string) or a list of city names.

age_range

Two-element list [min_age, max_age] specifying the age range to sample from (inclusive). Defaults to a standard age range. Both values must be between the minimum and maximum allowed ages.

sampler_type

Discriminator for the sampler type. Must be SamplerType.PERSON_FROM_FAKER.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 locale: str = Field(...)

1 sex: data_designer.config.sampler_params.SexT | None = Field(...)

1 city: str | list[str] | None = Field(...)

1 age_range: list[int] = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 generator_kwargs: list[str]

Keyword arguments to pass to the person generator.

1 people_gen_key: str

1 _validate_age_range(value: list[int]) -> list[int]

1 _validate_locale(value: str) -> str

SamplerParamsT

typing_extensions.TypeAlias

1 data_designer.config.sampler_params.is_numerical_sampler_type(sampler_type: data_designer.config.sampler_params.SamplerType) -> bool

Module Contents

Classes

Name	Description
`SamplerType`	str(object=”) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
`CategorySamplerParams`	Parameters for categorical sampling with optional probability weighting.
`DatetimeSamplerParams`	Parameters for uniform datetime sampling within a specified range.
`SubcategorySamplerParams`	Parameters for subcategory sampling conditioned on a parent category column.
`TimeDeltaSamplerParams`	Parameters for sampling time deltas relative to a reference datetime column.
`UUIDSamplerParams`	Parameters for generating UUID (Universally Unique Identifier) values.
`ScipySamplerParams`	Parameters for sampling from any scipy.stats continuous or discrete distribution.
`BinomialSamplerParams`	Parameters for sampling from a Binomial distribution.
`BernoulliSamplerParams`	Parameters for sampling from a Bernoulli distribution.
`BernoulliMixtureSamplerParams`	Parameters for sampling from a Bernoulli mixture distribution.
`GaussianSamplerParams`	Parameters for sampling from a Gaussian (Normal) distribution.
`PoissonSamplerParams`	Parameters for sampling from a Poisson distribution.
`UniformSamplerParams`	Parameters for sampling from a continuous Uniform distribution.
`PersonSamplerParams`	Parameters for sampling synthetic person data with demographic attributes.
`PersonFromFakerSamplerParams`	Parameters for sampling synthetic person data with demographic attributes from Faker.

Functions

Name	Description
`is_numerical_sampler_type`	None

Data

SexT SamplerParamsT

API

1 class data_designer.config.sampler_params.SamplerType

Bases: str, enum.Enum

1 BERNOULLI = bernoulli

1 BERNOULLI_MIXTURE = bernoulli_mixture

1 BINOMIAL = binomial

1 CATEGORY = category

1 DATETIME = datetime

1 GAUSSIAN = gaussian

1 PERSON = person

1 PERSON_FROM_FAKER = person_from_faker

1 POISSON = poisson

1 SCIPY = scipy

1 SUBCATEGORY = subcategory

1 TIMEDELTA = timedelta

1 UNIFORM = uniform

1 UUID = uuid

1 class data_designer.config.sampler_params.CategorySamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for categorical sampling with optional probability weighting.

Samples values from a discrete set of categories. When weights are provided, values are sampled according to their assigned probabilities. Without weights, uniform sampling is used.

Parameters:

values

List of possible categorical values to sample from. Can contain strings, integers, or floats. Must contain at least one value.

weights

Attributes:

values

`required`

List of possible categorical values to sample from. Can contain strings, integers, or floats. Must contain at least one value.

weights

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 values: list[str | int | float] = Field(...)

1 weights: list[float] | None = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 _normalize_weights_if_needed() -> typing_extensions.Self

1 _validate_equal_lengths() -> typing_extensions.Self

1 class data_designer.config.sampler_params.DatetimeSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for uniform datetime sampling within a specified range.

Samples datetime values uniformly between a start and end date with a specified granularity. The sampling unit determines the smallest possible time interval between consecutive samples.

Parameters:

start

Earliest possible datetime for the sampling range (inclusive). Must be a valid datetime string parseable by pandas.to_datetime().

end

Exclusive upper bound for the sampling range. Must be a valid datetime string parseable by pandas.to_datetime().

unit

Time unit for sampling granularity. Options:

“Y”: Years
“M”: Months
“D”: Days (default)
“h”: Hours
“m”: Minutes
“s”: Seconds

Attributes:

start

`required`

Earliest possible datetime for the sampling range (inclusive). Must be a valid datetime string parseable by pandas.to_datetime().

end

`required`

Exclusive upper bound for the sampling range. Must be a valid datetime string parseable by pandas.to_datetime().

unit

Time unit for sampling granularity. Options:

“Y”: Years
“M”: Months
“D”: Days (default)
“h”: Hours
“m”: Minutes
“s”: Seconds

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 start: str = Field(...)

1 end: str = Field(...)

1 unit: typing.Literal[Y, M, D, h, m, s] = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 _validate_param_is_datetime(value: str) -> str

1 class data_designer.config.sampler_params.SubcategorySamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for subcategory sampling conditioned on a parent category column.

Parameters:

category

Name of the parent category column that this subcategory depends on. The parent column must be generated before this subcategory column.

values

Mapping from each parent category value to a list of possible subcategory values. Each key must correspond to a value that appears in the parent category column.

Attributes:

category

`required`

Name of the parent category column that this subcategory depends on. The parent column must be generated before this subcategory column.

values

`required`

Mapping from each parent category value to a list of possible subcategory values. Each key must correspond to a value that appears in the parent category column.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 category: str = Field(...)

1 values: dict[str, list[str | int | float]] = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.TimeDeltaSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling time deltas relative to a reference datetime column.

Years and months are not supported as timedelta units because they have variable lengths. See: pandas timedelta documentation

Parameters:

dt_min

Minimum time-delta value (inclusive). Must be non-negative and less than dt_max. Specified in units defined by the unit parameter.

dt_max

Maximum time-delta value (exclusive). Must be positive and greater than dt_min. Specified in units defined by the unit parameter.

reference_column_name

Name of an existing datetime column to add the time-delta to. This column must be generated before the timedelta column.

unit

Time unit for the delta values. Options:

“D”: Days (default)
“h”: Hours
“m”: Minutes
“s”: Seconds

Attributes:

dt_min

`required`

Minimum time-delta value (inclusive). Must be non-negative and less than dt_max. Specified in units defined by the unit parameter.

dt_max

`required`

Maximum time-delta value (exclusive). Must be positive and greater than dt_min. Specified in units defined by the unit parameter.

reference_column_name

`required`

Name of an existing datetime column to add the time-delta to. This column must be generated before the timedelta column.

unit

Time unit for the delta values. Options:

“D”: Days (default)
“h”: Hours
“m”: Minutes
“s”: Seconds

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 dt_min: int = Field(...)

1 dt_max: int = Field(...)

1 reference_column_name: str = Field(...)

1 unit: typing.Literal[D, h, m, s] = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 _validate_min_less_than_max() -> typing_extensions.Self

1 class data_designer.config.sampler_params.UUIDSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for generating UUID (Universally Unique Identifier) values.

Generates UUID4 (random) identifiers with optional formatting options. UUIDs are useful for creating unique identifiers for records, entities, or transactions.

Parameters:

prefix

Optional string to prepend to each UUID. Useful for creating namespaced or typed identifiers (e.g., “user-”, “order-”, “txn-”).

short_form

If True, truncates UUIDs to 8 characters (first segment only). Default is False for full 32-character UUIDs (excluding hyphens).

uppercase

If True, converts all hexadecimal letters to uppercase. Default is False for lowercase UUIDs.

Attributes:

prefix

Optional string to prepend to each UUID. Useful for creating namespaced or typed identifiers (e.g., “user-”, “order-”, “txn-”).

short_form

If True, truncates UUIDs to 8 characters (first segment only). Default is False for full 32-character UUIDs (excluding hyphens).

uppercase

If True, converts all hexadecimal letters to uppercase. Default is False for lowercase UUIDs.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 prefix: str | None = Field(...)

1 short_form: bool = Field(...)

1 uppercase: bool = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 last_index: int

1 class data_designer.config.sampler_params.ScipySamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from any scipy.stats continuous or discrete distribution.

See: scipy.stats documentation

Parameters:

dist_name

Name of the scipy.stats distribution to sample from (e.g., “beta”, “gamma”, “lognorm”, “expon”). Must be a valid distribution name from scipy.stats.

dist_params

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded.

Attributes:

dist_name

`required`

Name of the scipy.stats distribution to sample from (e.g., “beta”, “gamma”, “lognorm”, “expon”). Must be a valid distribution name from scipy.stats.

dist_params

`required`

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 dist_name: str = Field(...)

1 dist_params: dict = Field(...)

1 decimal_places: int | None = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.BinomialSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a Binomial distribution.

Parameters:

Number of independent trials. Must be a positive integer.

Probability of success on each trial. Must be between 0.0 and 1.0 (inclusive).

Attributes:

`required`

Number of independent trials. Must be a positive integer.

`required`

Probability of success on each trial. Must be between 0.0 and 1.0 (inclusive).

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 n: int = Field(...)

1 p: float = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.BernoulliSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a Bernoulli distribution.

Parameters:

Probability of success (sampling 1). Must be between 0.0 and 1.0 (inclusive). The probability of failure (sampling 0) is automatically 1 - p.

Attributes:

`required`

Probability of success (sampling 1). Must be between 0.0 and 1.0 (inclusive). The probability of failure (sampling 0) is automatically 1 - p.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 p: float = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.BernoulliMixtureSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a Bernoulli mixture distribution.

Common use cases include modeling sparse events, zero-inflated data, or situations where an outcome either doesn’t occur (0) or follows a specific distribution when it does occur.

Parameters:

Probability of sampling from the mixture distribution (non-zero outcome). Must be between 0.0 and 1.0 (inclusive). With probability 1-p, the sample is 0.

dist_name

Name of the scipy.stats distribution to sample from when outcome is non-zero. Must be a valid scipy.stats distribution name (e.g., “norm”, “gamma”, “expon”).

dist_params

Parameters for the specified scipy.stats distribution.

Attributes:

`required`

Probability of sampling from the mixture distribution (non-zero outcome). Must be between 0.0 and 1.0 (inclusive). With probability 1-p, the sample is 0.

dist_name

`required`

Name of the scipy.stats distribution to sample from when outcome is non-zero. Must be a valid scipy.stats distribution name (e.g., “norm”, “gamma”, “expon”).

dist_params

`required`

Parameters for the specified scipy.stats distribution.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 p: float = Field(...)

1 dist_name: str = Field(...)

1 dist_params: dict = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.GaussianSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a Gaussian (Normal) distribution.

Parameters:

mean

Mean (center) of the Gaussian distribution. This is the expected value and the location of the distribution’s peak.

stddev

Standard deviation of the Gaussian distribution. Controls the spread or width of the distribution. Must be positive.

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded.

Attributes:

mean

`required`

Mean (center) of the Gaussian distribution. This is the expected value and the location of the distribution’s peak.

stddev

`required`

Standard deviation of the Gaussian distribution. Controls the spread or width of the distribution. Must be positive.

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 mean: float = Field(...)

1 stddev: float = Field(...)

1 decimal_places: int | None = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.PoissonSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a Poisson distribution.

The distribution is characterized by a single parameter (mean/rate), and both the mean and variance equal this parameter value.

Parameters:

mean

Mean number of events in the fixed interval (also called rate parameter λ). Must be positive. This represents both the expected value and the variance of the distribution.

Attributes:

mean

`required`

Mean number of events in the fixed interval (also called rate parameter λ). Must be positive. This represents both the expected value and the variance of the distribution.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 mean: float = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 class data_designer.config.sampler_params.UniformSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling from a continuous Uniform distribution.

Parameters:

low

Lower bound of the uniform distribution (inclusive). Can be any real number.

high

Upper bound of the uniform distribution. Must be greater than low.

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded and may have many decimal places.

Attributes:

low

`required`

Lower bound of the uniform distribution (inclusive). Can be any real number.

high

`required`

Upper bound of the uniform distribution. Must be greater than low.

decimal_places

Optional number of decimal places to round sampled values to. If None, values are not rounded and may have many decimal places.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 low: float = Field(...)

1 high: float = Field(...)

1 decimal_places: int | None = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

SexT

typing_extensions.TypeAlias

1 class data_designer.config.sampler_params.PersonSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling synthetic person data with demographic attributes.

Parameters:

locale

sex

If specified, filters to only sample people of the specified sex. Options: “Male” or “Female”. If None, samples both sexes.

city

If specified, filters to only sample people from the specified city or cities. Can be a single city name (string) or a list of city names.

age_range

Two-element list [min_age, max_age] specifying the age range to sample from (inclusive). Defaults to a standard age range. Both values must be between minimum and maximum allowed ages.

with_synthetic_personas

If True, appends additional synthetic persona columns including personality traits, interests, and background descriptions. Only supported for certain locales with managed datasets.

select_field_values

Optional field-value filters for managed datasets. Supported field names are checked against the managed person data fields.

Attributes:

locale

sex

If specified, filters to only sample people of the specified sex. Options: “Male” or “Female”. If None, samples both sexes.

city

If specified, filters to only sample people from the specified city or cities. Can be a single city name (string) or a list of city names.

age_range

Two-element list [min_age, max_age] specifying the age range to sample from (inclusive). Defaults to a standard age range. Both values must be between minimum and maximum allowed ages.

with_synthetic_personas

If True, appends additional synthetic persona columns including personality traits, interests, and background descriptions. Only supported for certain locales with managed datasets.

select_field_values

Optional field-value filters for managed datasets. Supported field names are checked against the managed person data fields.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 locale: str = Field(...)

1 sex: data_designer.config.sampler_params.SexT | None = Field(...)

1 city: str | list[str] | None = Field(...)

1 age_range: list[int] = Field(...)

1 select_field_values: dict[str, list[str]] | None = Field(...)

1 with_synthetic_personas: bool = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 generator_kwargs: list[str]

Keyword arguments to pass to the person generator.

1 people_gen_key: str

1 _validate_age_range(value: list[int]) -> list[int]

1 _validate_locale_with_managed_datasets() -> typing_extensions.Self

1 class data_designer.config.sampler_params.PersonFromFakerSamplerParams(
2     /,
3     **data: typing.Any
4 )

Bases: data_designer.config.base.ConfigBase

Parameters for sampling synthetic person data with demographic attributes from Faker.

Parameters:

locale

Locale string determining the language and geographic region for synthetic people. Can be any locale supported by Faker.

sex

If specified, filters to only sample people of the specified sex. Options: “Male” or “Female”. If None, samples both sexes.

city

If specified, filters to only sample people from the specified city or cities. Can be a single city name (string) or a list of city names.

age_range

Two-element list [min_age, max_age] specifying the age range to sample from (inclusive). Defaults to a standard age range. Both values must be between the minimum and maximum allowed ages.

sampler_type

Discriminator for the sampler type. Must be SamplerType.PERSON_FROM_FAKER.

Attributes:

locale

Locale string determining the language and geographic region for synthetic people. Can be any locale supported by Faker.

sex

If specified, filters to only sample people of the specified sex. Options: “Male” or “Female”. If None, samples both sexes.

city

If specified, filters to only sample people from the specified city or cities. Can be a single city name (string) or a list of city names.

age_range

Two-element list [min_age, max_age] specifying the age range to sample from (inclusive). Defaults to a standard age range. Both values must be between the minimum and maximum allowed ages.

sampler_type

Discriminator for the sampler type. Must be SamplerType.PERSON_FROM_FAKER.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1 locale: str = Field(...)

1 sex: data_designer.config.sampler_params.SexT | None = Field(...)

1 city: str | list[str] | None = Field(...)

1 age_range: list[int] = Field(...)

1 sampler_type: typing.Literal[data_designer.config.sampler_params.SamplerType]

1 generator_kwargs: list[str]

Keyword arguments to pass to the person generator.

1 people_gen_key: str

1 _validate_age_range(value: list[int]) -> list[int]

1 _validate_locale(value: str) -> str

SamplerParamsT

typing_extensions.TypeAlias

1 data_designer.config.sampler_params.is_numerical_sampler_type(sampler_type: data_designer.config.sampler_params.SamplerType) -> bool

1	class data_designer.config.sampler_params.CategorySamplerParams(
2	/,
3	**data: typing.Any
4	)