For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
  • Dev Notes
    • Overview
    • Prompt Sensitivity
    • Retriever SDG Toolkit
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Overview
  • Validator Types
  • 🐍 Python Code Validator
  • 🗄️ SQL Code Validator
  • 🔧 Local Callable Validator
  • 🌐 Remote Validator
  • Using Validators in Columns
  • Configuration Parameters
  • Batch Size Considerations
  • Multiple Column Validation
  • See Also
Concepts

Validators

||View as Markdown|
Previous

Custom Columns

Next

Processors

Validators are quality assurance mechanisms in Data Designer that check generated content against rules and return structured pass/fail results. They enable automated verification of data for correctness, code quality, and adherence to specifications.

Quality Gates for Generated Data Validators act as quality gates in your generation pipeline. Use them to filter invalid records, score code quality, verify format compliance, or integrate with external validation services.

Overview

Validation columns execute validation logic against target columns and produce structured results indicating:

  • is_valid: Boolean pass/fail status
  • Additional metadata: Error messages, scores, severity levels, and custom fields

Validators currently support three execution strategies:

  1. Code validation: Lint and check Python or SQL code using industry-standard tools
  2. Local callable validation: Execute custom Python functions for flexible validation logic
  3. Remote validation: Send data to HTTP endpoints for external validation services

Validator Types

🐍 Python Code Validator

The Python code validator runs generated Python code through Ruff, a fast Python linter that checks for syntax errors, undefined variables, and code quality issues.

Configuration:

1import data_designer.config as dd
2
3validator_params = dd.CodeValidatorParams(code_lang=dd.CodeLang.PYTHON)

Validation Output:

Each validated record returns:

  • is_valid: True if no fatal or error-level issues found
  • python_linter_score: Quality score from 0-10 (based on pylint formula)
  • python_linter_severity: Highest severity level found ("none", "convention", "refactor", "warning", "error", "fatal")
  • python_linter_messages: List of linter messages with line numbers, columns, and descriptions

Severity Levels:

  • Fatal: Syntax errors preventing code execution
  • Error: Undefined names, invalid syntax
  • Warning: Code smells and potential issues
  • Refactor: Simplification opportunities
  • Convention: Style guide violations

A record is marked valid if it has no messages or only messages at warning/convention/refactor levels.

Example Validation Result:

1{
2 "is_valid": False,
3 "python_linter_score": 0,
4 "python_linter_severity": "error",
5 "python_linter_messages": [
6 {
7 "type": "error",
8 "symbol": "F821",
9 "line": 1,
10 "column": 7,
11 "message": "Undefined name `it`"
12 }
13 ]
14}

🗄️ SQL Code Validator

The SQL code validator uses SQLFluff, a dialect-aware SQL linter that checks query syntax and structure.

Configuration:

1import data_designer.config as dd
2
3validator_params = dd.CodeValidatorParams(code_lang=dd.CodeLang.SQL_POSTGRES)

Multiple Dialects The SQL code validator supports multiple dialects: SQL_POSTGRES, SQL_ANSI, SQL_MYSQL, SQL_SQLITE, SQL_TSQL and SQL_BIGQUERY.

Validation Output:

Each validated record returns:

  • is_valid: True if no parsing errors found
  • error_messages: Concatenated error descriptions (empty string if valid)

The validator focuses on parsing errors (PRS codes) that indicate malformed SQL. It also checks for common pitfalls like DECIMAL definitions without scale parameters.

Example Validation Result:

1# Valid SQL
2{
3 "is_valid": True,
4 "error_messages": ""
5}
6
7# Invalid SQL
8{
9 "is_valid": False,
10 "error_messages": "PRS: Line 1, Position 1: Found unparsable section: 'NOT SQL'"
11}

🔧 Local Callable Validator

The local callable validator executes custom Python functions for flexible validation logic.

Configuration:

1import pandas as pd
2
3import data_designer.config as dd
4
5def my_validation_function(df: pd.DataFrame) -> pd.DataFrame:
6 """Validate that values are positive.
7
8 Args:
9 df: DataFrame with target columns
10
11 Returns:
12 DataFrame with is_valid column and optional metadata
13 """
14 result = pd.DataFrame()
15 result["is_valid"] = df["price"] > 0
16 result["error_message"] = result["is_valid"].apply(
17 lambda valid: "" if valid else "Price must be positive"
18 )
19 return result
20
21validator_params = dd.LocalCallableValidatorParams(
22 validation_function=my_validation_function,
23 output_schema={ # Optional: enforce output schema
24 "type": "object",
25 "properties": {
26 "data": {
27 "type": "array",
28 "items": {
29 "type": "object",
30 "properties": {
31 "is_valid": {"type": ["boolean", "null"]},
32 "error_message": {"type": "string"}
33 },
34 "required": ["is_valid"]
35 }
36 }
37 }
38 }
39)

Function Requirements:

  • Input: DataFrame with target columns
  • Output: DataFrame with is_valid column (boolean or null)
  • Extra fields: Any additional columns become validation metadata

The output_schema parameter is optional but recommended—it validates the function’s output against a JSON schema, catching unexpected return formats.

🌐 Remote Validator

The remote validator sends data to HTTP endpoints for validation-as-a-service. This is useful for when you have validation software that needs to run on external compute and you can expose it through a service. Some examples are:

  • External linting services
  • Security scanners
  • Domain-specific validators
  • Proprietary validation systems

Authentication Currently, the remote validator is only able to perform unauthenticated API calls. When implementing your own service, you can rely on network isolation for security. If you need to reach a service that requires authentication, you should implement a local proxy.

Configuration:

1import data_designer.config as dd
2
3validator_params = dd.RemoteValidatorParams(
4 endpoint_url="https://api.example.com/validate",
5 timeout=30.0, # Request timeout in seconds
6 max_retries=3, # Retry attempts on failure
7 retry_backoff=2.0, # Exponential backoff factor
8 max_parallel_requests=4, # Concurrent request limit
9 output_schema={ # Optional: enforce response schema
10 "type": "object",
11 "properties": {
12 "data": {
13 "type": "array",
14 "items": {
15 "type": "object",
16 "properties": {
17 "is_valid": {"type": ["boolean", "null"]},
18 "confidence": {"type": "string"}
19 }
20 }
21 }
22 }
23 }
24)

Request Format:

The validator sends POST requests with this structure:

1{
2 "data": [
3 {"column1": "value1", "column2": "value2"},
4 {"column1": "value3", "column2": "value4"}
5 ]
6}

Expected Response Format:

The endpoint must return:

1{
2 "data": [
3 {
4 "is_valid": true,
5 "custom_field": "any additional metadata"
6 },
7 {
8 "is_valid": false,
9 "custom_field": "more metadata"
10 }
11 ]
12}

Retry Behavior:

The validator automatically retries on:

  • Network errors
  • HTTP status codes: 429 (rate limit), 500, 502, 503, 504

Failed requests use exponential backoff: delay = retry_backoff^attempt.

Parallelization:

Set max_parallel_requests to control concurrency. Higher values improve throughput but increase server load. The validator batches requests according to the batch_size parameter in the validation column configuration.

Using Validators in Columns

Add validation columns to your configuration using the builder’s add_column method:

1import data_designer.config as dd
2
3builder = dd.DataDesignerConfigBuilder()
4
5# Generate Python code
6builder.add_column(
7 dd.LLMCodeColumnConfig(
8 name="sorting_algorithm",
9 prompt="Write a Python function to sort a list using bubble sort.",
10 code_lang="python",
11 model_alias="my-model"
12 )
13)
14
15# Validate the generated code
16builder.add_column(
17 dd.ValidationColumnConfig(
18 name="code_validation",
19 target_columns=["sorting_algorithm"],
20 validator_type="code",
21 validator_params=dd.CodeValidatorParams(code_lang=dd.CodeLang.PYTHON),
22 batch_size=10,
23 drop=False,
24 )
25)

The target_columns parameter specifies which columns to validate. All target columns are passed to the validator together (except for code validators, which process each column separately).

Configuration Parameters

See more about parameters used to instantiate ValidationColumnConfig in the code reference.

Batch Size Considerations

Larger batch sizes improve efficiency but consume more memory:

  • Code validators: 5-20 records (file I/O overhead)
  • Local callable: 10-50 records (depends on function complexity)
  • Remote validators: 1-10 records (network latency, server capacity)

Adjust based on:

  • Validator computational cost
  • Available memory
  • Network bandwidth (for remote validators)
  • Server rate limits

If the validation logic uses information from other samples, only samples in the batch will be considered.

Multiple Column Validation

Validate multiple columns simultaneously:

1import data_designer.config as dd
2
3builder.add_column(
4 dd.ValidationColumnConfig(
5 name="multi_column_validation",
6 target_columns=["column_a", "column_b", "column_c"],
7 validator_type="remote",
8 validator_params=dd.RemoteValidatorParams(
9 endpoint_url="https://api.example.com/validate"
10 )
11 )
12)

Note: Code validators always process each target column separately, even when multiple columns are specified. Local callable and remote validators receive all target columns together.

See Also

  • Validator Parameters Reference: Configuration object schemas