Add Constraints to Columns#
Column Constraints in Data Designer#
Data Designer allows you to apply constraints to columns, ensuring that generated values meet specific criteria. This guide explains the types of constraints available and how to use them.
Overview of Column Constraints#
Constraints are rules applied to columns that restrict the range or type of values they can contain. They are particularly useful for:
Ensuring numerical values stay within specific ranges
Enforcing relationships between columns
Validating that generated data meets business rules
The primary way to establish constraints in Data Designer is using the add_constraint
method for explicit constraint rules.
Important
Constraints Work Only with Sampling-Based Columns
Constraints in Data Designer are currently only supported for sampling-based columns. Constraints do not work with expression or LLM-generated columns.
For non-numerical columns or complex logic, you can use conditional logic in expressions using Jinja templates as shown in the examples below.
For a list of supported sampler types, see the Sampler Reference Table.
Adding Explicit Constraints#
To add a constraint to a column, use the add_constraint
method:
config_builder.add_constraint(
target_column="column_name",
type="scalar_inequality",
params={"operator": "ge", "rhs": 18}
)
Adding Constraints
Scalar Inequality Constraints#
These constraints enforce that a column’s values meet an inequality comparison with a fixed scalar value:
# Ensure 'age' is at least 18
config_builder.add_constraint(
target_column="age",
type="scalar_inequality",
params={"operator": "ge", "rhs": 18}
)
# Ensure 'price' is less than 1000
config_builder.add_constraint(
target_column="price",
type="scalar_inequality",
params={"operator": "lt", "rhs": 1000}
)
Supported operators#
"gt"
: Greater than"ge"
: Greater than or equal to"lt"
: Less than"le"
: Less than or equal to
Column Inequality Constraints#
These constraints enforce relationships between two columns:
# Ensure 'end_date' is after 'start_date'
config_builder.add_constraint(
target_column="end_date",
type="column_inequality",
params={"operator": "gt", "rhs": "start_date"}
)
# Ensure 'discount_price' is less than 'original_price'
config_builder.add_constraint(
target_column="discount_price",
type="column_inequality",
params={"operator": "lt", "rhs": "original_price"}
)
Practical Examples#
Age Constraints for Different Customer Segments
# Add customer segment column
config_builder.add_column(
name="customer_segment",
type="category",
params={"values": ["Youth", "Adult", "Senior"]}
)
# Add age column based on gaussian distribution
config_builder.add_column(
name="age",
type="gaussian",
params={"mean": 40, "stddev": 15},
convert_to="int"
)
# Add constraints based on segment
config_builder.add_constraint(
target_column="age",
type="scalar_inequality",
params={"operator": "ge", "rhs": 18} # set a minimum age of 18
)
Financial Data with Multiple Constraints
# Base income column
config_builder.add_column(
name="annual_income",
type="gaussian",
params={"mean": 65000, "stddev": 25000},
convert_to="int"
)
# Ensure income is realistic
config_builder.add_constraint(
target_column="annual_income",
type="scalar_inequality",
params={"operator": "ge", "rhs": 20000} # Minimum income
)
config_builder.add_constraint(
target_column="annual_income",
type="scalar_inequality",
params={"operator": "le", "rhs": 500000} # Maximum income
)
# Credit score based on income tier
config_builder.add_column(
name="credit_score",
type="uniform",
params={"low": 300, "high": 850},
convert_to="int"
)
# Ensure credit score is within valid range
config_builder.add_constraint(
target_column="credit_score",
type="scalar_inequality",
params={"operator": "ge", "rhs": 300}
)
config_builder.add_constraint(
target_column="credit_score",
type="scalar_inequality",
params={"operator": "le", "rhs": 850}
)
Date Range Constraints
# Add order date column
config_builder.add_column(
name="order_date",
type="datetime",
# dates will be sampled from the range specified
params={"start": "2023-01-01", "end": "2023-12-31"}
)
config_builder.add_column(
name="delivery_date",
type="datetime",
params={"start": "2023-01-01", "end": "2024-12-31"}
)
# Ensure the delivery date is after the order date
config_builder.add_constraint(
target_column="delivery_date",
type="column_inequality",
params={"operator": "gt", "rhs": "order_date"}
)
Inventory Management Example
# Add inventory level column
config_builder.add_column(
name="inventory_level",
type="poisson",
params={"mean": 100}
)
# Add reorder threshold column
config_builder.add_column(
name="reorder_threshold",
type="gaussian",
params={"mean": 25, "stddev": 5},
convert_to="int"
)
# Add reorder amount column
config_builder.add_column(
name="reorder_amount",
type="gaussian",
params={"mean": 50, "stddev": 10},
convert_to="int"
)
# Ensure reorder threshold is positive
config_builder.add_constraint(
target_column="reorder_threshold",
type="scalar_inequality",
params={"operator": "gt", "rhs": 0}
)
# Ensure reorder amount is positive
config_builder.add_constraint(
target_column="reorder_amount",
type="scalar_inequality",
params={"operator": "gt", "rhs": 0}
)
Limitations
Sampler-based columns only: Constraints are only supported for sampling-based columns.
Single-column constraints: Each constraint applies to one target column, though column inequality constraints can reference other columns
Rejection sampling: Complex constraints may require many iterations to satisfy, potentially slowing generation
No complex logic: For advanced conditional logic, use Jinja templates in LLM prompts instead
Best Practices
Start simple: Begin with basic scalar constraints before adding complex column relationships
Consider performance: Very restrictive constraints may require many rejection sampling iterations
Use realistic ranges: Ensure constraint values align with your data distribution parameters
Handle edge cases: For complex business logic, combine simple constraints with Jinja templates in LLM prompts