For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • seed_readers
      • processors
      • mcp
      • column_generators
      • Seed Reader API
      • Processor API
        • Base
        • Drop Columns
        • Registry
        • Schema Transform
      • MCP Runtime API
      • Column Generator API
  • Dev Notes
    • Overview
    • Prompt Sensitivity
    • Retriever SDG Toolkit
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Classes
  • API
Code ReferenceEngine Extension APIProcessor API

data_designer.engine.processing.processors.base

||View as Markdown|
Previous

Processor API

Next

Drop Columns

Module Contents

Classes

NameDescription
ProcessorBase class for dataset processors.

API

1class data_designer.engine.processing.processors.base.Processor(
2 config: data_designer.engine.configurable_task.TaskConfigT,
3 resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
4)

Bases: data_designer.engine.configurable_task.ConfigurableTask[data_designer.engine.configurable_task.TaskConfigT], abc.ABC

Base class for dataset processors.

Processors transform data at different stages of the generation pipeline. Override the callback methods for the stages you want to handle.

1implements(method_name: str) -> bool

Check if subclass overrides a callback method.

1process_before_batch(data: data_designer.engine.configurable_task.DataT) -> data_designer.engine.configurable_task.DataT

Called at PRE_BATCH stage before each batch is generated.

Override to transform batch data before generation begins.

Parameters:

data
data_designer.engine.configurable_task.DataT

The batch data before generation.

Returns:

data_designer.engine.configurable_task.DataT

Transformed batch data.

1process_after_batch(
2 data: data_designer.engine.configurable_task.DataT,
3 *,
4 current_batch_number: int | None
5) -> data_designer.engine.configurable_task.DataT

Called at POST_BATCH stage after each batch is generated.

Override to process each batch of generated data.

Parameters:

data
data_designer.engine.configurable_task.DataT

The generated batch data.

current_batch_number
int | None

The current batch number (0-indexed), or None in preview mode.

Returns:

data_designer.engine.configurable_task.DataT

Transformed batch data.

1process_after_generation(data: data_designer.engine.configurable_task.DataT) -> data_designer.engine.configurable_task.DataT

Called at AFTER_GENERATION stage on the final combined dataset.

Override to transform the complete generated dataset.

Parameters:

data
data_designer.engine.configurable_task.DataT

The final combined dataset.

Returns:

data_designer.engine.configurable_task.DataT

Transformed final dataset.