For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • models
      • mcp
      • column_configs
      • config_builder
      • data_designer_config
      • run_config
      • sampler_params
      • validator_params
      • seeds
      • processors
      • analysis
      • Config API
        • Analysis
        • Base
        • Column Configs
        • Column Types
        • Config Builder
        • Custom Column
        • Data Designer Config
        • Dataset Metadata
        • Default Model Settings
        • Errors
        • Exportable Config
        • Fingerprint
        • Interface
        • Mcp
        • Models
        • Preview Results
        • Processor Types
        • Processors
        • Run Config
        • Sampler Constraints
        • Sampler Params
        • Seed
        • Seed Source
        • Seed Source Dataframe
        • Seed Source Types
        • Testing
        • Utils
        • Validator Params
        • Version
  • Dev Notes
    • Overview
    • Prompt Sensitivity
    • Retriever SDG Toolkit
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Classes
  • Functions
  • API
Code ReferenceConfigConfig API

data_designer.config.processors

||View as Markdown|
Previous

Processor Types

Next

Run Config

Module Contents

Classes

NameDescription
ProcessorTypeEnumeration of available processor types.
DropColumnsProcessorConfigDrop columns from the output dataset (prefer drop=True in the column config).
SchemaTransformProcessorConfigConfiguration for transforming the dataset schema using Jinja2 templates.

Functions

NameDescription
get_processor_config_from_kwargsCreate a processor configuration from a processor type and keyword arguments.

API

1class data_designer.config.processors.ProcessorType

Bases: str, enum.Enum

Enumeration of available processor types.

Attributes:

DROP_COLUMNS

Processor that removes specified columns from the output dataset.

SCHEMA_TRANSFORM

Processor that creates a new dataset with a transformed schema using Jinja2 templates.

Initialization:

Initialize self. See help(type(self)) for accurate signature.

1DROP_COLUMNS = drop_columns
1SCHEMA_TRANSFORM = schema_transform
1data_designer.config.processors.get_processor_config_from_kwargs(
2 processor_type: data_designer.config.processors.ProcessorType,
3 **kwargs: typing.Any
4) -> data_designer.config.base.ProcessorConfig

Create a processor configuration from a processor type and keyword arguments.

Parameters:

processor_type
data_designer.config.processors.ProcessorType

The type of processor to create.

**kwargs

Additional keyword arguments passed to the processor constructor.

Returns:

data_designer.config.base.ProcessorConfig

A processor configuration object of the specified type.

1class data_designer.config.processors.DropColumnsProcessorConfig(
2 /,
3 **data: typing.Any
4)

Bases: data_designer.config.base.ProcessorConfig

Drop columns from the output dataset (prefer drop=True in the column config).

This processor removes specified columns from the generated dataset. The dropped columns are saved separately in the dropped-columns-parquet-files directory for reference. When this processor is added via the config builder, the corresponding column configs are automatically marked with drop = True.

Parameters:

column_names

List of column names to remove from the output dataset.

Inherited Attributes: name (required): Name of the processor. Attributes:

column_names
`required`

List of column names to remove from the output dataset.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1column_names: list[str] = Field(...)
1processor_type: typing.Literal[data_designer.config.processors.ProcessorType]
1class data_designer.config.processors.SchemaTransformProcessorConfig(
2 /,
3 **data: typing.Any
4)

Bases: data_designer.config.base.ProcessorConfig

Configuration for transforming the dataset schema using Jinja2 templates.

This processor creates a new dataset with a transformed schema. Each key in the template becomes a column in the output, and values are Jinja2 templates that can reference any column in the batch. The transformed dataset is written to a processors-files/{processor_name}/ directory alongside the main dataset.

Parameters:

template

Dictionary defining the output schema. Keys are new column names, values are Jinja2 templates (strings, lists, or nested structures). Must be JSON-serializable.

Inherited Attributes: name (required): Name of the processor. Attributes:

template
`required`

Dictionary defining the output schema. Keys are new column names, values are Jinja2 templates (strings, lists, or nested structures). Must be JSON-serializable.

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1template: dict[str, typing.Any] = Field(...)
1processor_type: typing.Literal[data_designer.config.processors.ProcessorType]
1validate_template(v: dict[str, typing.Any]) -> dict[str, typing.Any]