Validators are quality assurance mechanisms in Data Designer that check generated content against rules and return structured pass/fail results. They enable automated verification of data for correctness, code quality, and adherence to specifications.
Quality Gates for Generated Data Validators act as quality gates in your generation pipeline. Use them to filter invalid records, score code quality, verify format compliance, or integrate with external validation services.
Validation columns execute validation logic against target columns and produce structured results indicating:
is_valid: Boolean pass/fail statusValidators currently support three execution strategies:
The Python code validator runs generated Python code through Ruff, a fast Python linter that checks for syntax errors, undefined variables, and code quality issues.
Configuration:
Validation Output:
Each validated record returns:
is_valid: True if no fatal or error-level issues foundpython_linter_score: Quality score from 0-10 (based on pylint formula)python_linter_severity: Highest severity level found ("none", "convention", "refactor", "warning", "error", "fatal")python_linter_messages: List of linter messages with line numbers, columns, and descriptionsSeverity Levels:
A record is marked valid if it has no messages or only messages at warning/convention/refactor levels.
Example Validation Result:
The SQL code validator uses SQLFluff, a dialect-aware SQL linter that checks query syntax and structure.
Configuration:
Multiple Dialects
The SQL code validator supports multiple dialects: SQL_POSTGRES, SQL_ANSI, SQL_MYSQL, SQL_SQLITE, SQL_TSQL and SQL_BIGQUERY.
Validation Output:
Each validated record returns:
is_valid: True if no parsing errors founderror_messages: Concatenated error descriptions (empty string if valid)The validator focuses on parsing errors (PRS codes) that indicate malformed SQL. It also checks for common pitfalls like DECIMAL definitions without scale parameters.
Example Validation Result:
The local callable validator executes custom Python functions for flexible validation logic.
Configuration:
Function Requirements:
is_valid column (boolean or null)The output_schema parameter is optional but recommended—it validates the function’s output against a JSON schema, catching unexpected return formats.
The remote validator sends data to HTTP endpoints for validation-as-a-service. This is useful for when you have validation software that needs to run on external compute and you can expose it through a service. Some examples are:
Authentication Currently, the remote validator is only able to perform unauthenticated API calls. When implementing your own service, you can rely on network isolation for security. If you need to reach a service that requires authentication, you should implement a local proxy.
Configuration:
Request Format:
The validator sends POST requests with this structure:
Expected Response Format:
The endpoint must return:
Retry Behavior:
The validator automatically retries on:
Failed requests use exponential backoff: delay = retry_backoff^attempt.
Parallelization:
Set max_parallel_requests to control concurrency. Higher values improve throughput but increase server load. The validator batches requests according to the batch_size parameter in the validation column configuration.
Add validation columns to your configuration using the builder’s add_column method:
The target_columns parameter specifies which columns to validate. All target columns are passed to the validator together (except for code validators, which process each column separately).
See more about parameters used to instantiate ValidationColumnConfig in the code reference.
Larger batch sizes improve efficiency but consume more memory:
Adjust based on:
If the validation logic uses information from other samples, only samples in the batch will be considered.
Validate multiple columns simultaneously:
Note: Code validators always process each target column separately, even when multiple columns are specified. Local callable and remote validators receive all target columns together.