Add Constraints to Columns#

Column Constraints in Data Designer#

Data Designer allows you to apply constraints to columns, ensuring that generated values meet specific criteria. This guide explains the types of constraints available and how to use them.

Overview of Column Constraints#

Constraints are rules applied to columns that restrict the range or type of values they can contain. They are particularly useful for:

  • Ensuring numerical values stay within specific ranges

  • Enforcing relationships between columns

  • Validating that generated data meets business rules

The primary way to establish constraints in Data Designer is using the add_constraint method for explicit constraint rules.

Important

Constraints Work Only with Sampling-Based Columns

Constraints in Data Designer are currently only supported for sampling-based columns. Constraints do not work with expression or LLM-generated columns.

For non-numerical columns or complex logic, you can use conditional logic in expressions using Jinja templates as shown in the examples below.

For a list of supported sampler types, see the Sampler Reference Table.

Adding Explicit Constraints#

To add a constraint to a column, use the add_constraint method:

config_builder.add_constraint(
    target_column="column_name",
    type="scalar_inequality",   
    params={"operator": "ge", "rhs": 18}
)

Adding Constraints

Scalar Inequality Constraints#

These constraints enforce that a column’s values meet an inequality comparison with a fixed scalar value:

# Ensure 'age' is at least 18
config_builder.add_constraint(
    target_column="age",
    type="scalar_inequality",
    params={"operator": "ge", "rhs": 18}
)

# Ensure 'price' is less than 1000
config_builder.add_constraint(
    target_column="price",
    type="scalar_inequality",
    params={"operator": "lt", "rhs": 1000}
)

Supported operators#

  • "gt": Greater than

  • "ge": Greater than or equal to

  • "lt": Less than

  • "le": Less than or equal to

Column Inequality Constraints#

These constraints enforce relationships between two columns:

# Ensure 'end_date' is after 'start_date'
config_builder.add_constraint(
    target_column="end_date",
    type="column_inequality",
    params={"operator": "gt", "rhs": "start_date"} 
)

# Ensure 'discount_price' is less than 'original_price'
config_builder.add_constraint(
    target_column="discount_price",
    type="column_inequality",
    params={"operator": "lt", "rhs": "original_price"} 
)

Practical Examples#

Age Constraints for Different Customer Segments

# Add customer segment column
config_builder.add_column(
    name="customer_segment",
    type="category",
    params={"values": ["Youth", "Adult", "Senior"]}
)

# Add age column based on gaussian distribution
config_builder.add_column(
    name="age",
    type="gaussian",
    params={"mean": 40, "stddev": 15},
    convert_to="int"
)

# Add constraints based on segment
config_builder.add_constraint(
    target_column="age",
    type="scalar_inequality",
    params={"operator": "ge", "rhs": 18} # set a minimum age of 18
)

Financial Data with Multiple Constraints

# Base income column
config_builder.add_column(
    name="annual_income",
    type="gaussian",
    params={"mean": 65000, "stddev": 25000},
    convert_to="int"
)

# Ensure income is realistic
config_builder.add_constraint(
    target_column="annual_income",
    type="scalar_inequality",
    params={"operator": "ge", "rhs": 20000}  # Minimum income
)

config_builder.add_constraint(
    target_column="annual_income", 
    type="scalar_inequality",
    params={"operator": "le", "rhs": 500000}  # Maximum income
)

# Credit score based on income tier
config_builder.add_column(
    name="credit_score",
    type="uniform",
    params={"low": 300, "high": 850},
    convert_to="int"
)

# Ensure credit score is within valid range
config_builder.add_constraint(
    target_column="credit_score",
    type="scalar_inequality", 
    params={"operator": "ge", "rhs": 300}
)

config_builder.add_constraint(
    target_column="credit_score",
    type="scalar_inequality",
    params={"operator": "le", "rhs": 850}
)

Date Range Constraints

# Add order date column
config_builder.add_column(
    name="order_date",
    type="datetime",
    # dates will be sampled from the range specified
    params={"start": "2023-01-01", "end": "2023-12-31"}
)

config_builder.add_column(
    name="delivery_date",
    type="datetime",
    params={"start": "2023-01-01", "end": "2024-12-31"}
)

# Ensure the delivery date is after the order date
config_builder.add_constraint(
    target_column="delivery_date",
    type="column_inequality",
    params={"operator": "gt", "rhs": "order_date"}
)

Inventory Management Example

# Add inventory level column
config_builder.add_column(
    name="inventory_level",
    type="poisson",
    params={"mean": 100}
)

# Add reorder threshold column
config_builder.add_column(
    name="reorder_threshold",
    type="gaussian",
    params={"mean": 25, "stddev": 5},
    convert_to="int"
)

# Add reorder amount column
config_builder.add_column(
    name="reorder_amount",
    type="gaussian",
    params={"mean": 50, "stddev": 10},
    convert_to="int"
)

# Ensure reorder threshold is positive
config_builder.add_constraint(
    target_column="reorder_threshold",
    type="scalar_inequality",
    params={"operator": "gt", "rhs": 0}
)

# Ensure reorder amount is positive
config_builder.add_constraint(
    target_column="reorder_amount",
    type="scalar_inequality",
    params={"operator": "gt", "rhs": 0}
)

Limitations

  • Sampler-based columns only: Constraints are only supported for sampling-based columns.

  • Single-column constraints: Each constraint applies to one target column, though column inequality constraints can reference other columns

  • Rejection sampling: Complex constraints may require many iterations to satisfy, potentially slowing generation

  • No complex logic: For advanced conditional logic, use Jinja templates in LLM prompts instead

Best Practices

  1. Start simple: Begin with basic scalar constraints before adding complex column relationships

  2. Consider performance: Very restrictive constraints may require many rejection sampling iterations

  3. Use realistic ranges: Ensure constraint values align with your data distribution parameters

  4. Handle edge cases: For complex business logic, combine simple constraints with Jinja templates in LLM prompts