For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • seed_readers
      • processors
      • mcp
      • column_generators
      • Seed Reader API
      • Processor API
      • MCP Runtime API
      • Column Generator API
  • Dev Notes
    • Overview
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Classes
  • Functions
  • Data
  • API
Code ReferenceEngine Extension API

data_designer.engine.column_generators.generators.base

||View as Markdown|
Previous

Registry

Next

Dev Notes

Module Contents

Classes

NameDescription
ColumnGeneratorHelper class that provides a standard way to create an ABC using inheritance.
FromScratchColumnGeneratorHelper class that provides a standard way to create an ABC using inheritance.
ColumnGeneratorWithModelRegistryHelper class that provides a standard way to create an ABC using inheritance.
ColumnGeneratorWithModelHelper class that provides a standard way to create an ABC using inheritance.
ColumnGeneratorCellByCellBase class for column generators invoked once per row.
ColumnGeneratorFullColumnBase class for column generators that transform a full batch at once.

Functions

NameDescription
_run_coroutine_syncRun an async coroutine from sync context.

Data

_T SYNC_BRIDGE_TIMEOUT logger

API

1_T = TypeVar(...)
1SYNC_BRIDGE_TIMEOUT = 300
1logger = getLogger(...)
1data_designer.engine.column_generators.generators.base._run_coroutine_sync(coro: typing.Coroutine[typing.Any, typing.Any, data_designer.engine.column_generators.generators.base._T]) -> data_designer.engine.column_generators.generators.base._Tdata_designer.engine.column_generators.generators.base._run_coroutine_sync(coro: typing.Coroutine[typing.Any, typing.Any, data_designer.engine.column_generators.generators.base._T]) -> data_designer.engine.column_generators.generators.base._T

Run an async coroutine from sync context.

  • No running event loop → asyncio.run(coro)
  • Running event loop (e.g. notebook/service) → run in a background thread
1class data_designer.engine.column_generators.generators.base.ColumnGenerator(
2 config: data_designer.engine.configurable_task.TaskConfigT,
3 resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
4)

Bases: data_designer.engine.configurable_task.ConfigurableTask[data_designer.engine.configurable_task.TaskConfigT], abc.ABC

1can_generate_from_scratch: bool
1is_llm_bound: bool

Whether this generator makes model/API calls during generation.

1is_order_dependent: bool

Whether this generator’s output depends on prior row-group calls.

Example: SeedDatasetColumnGenerator tracks its position in the seed dataset, so row group N must complete before N+1 starts.

1_is_overridden(method_name: str) -> bool

Check if a subclass has overridden a base ColumnGenerator method.

1get_generation_strategy() -> data_designer.config.column_configs.GenerationStrategy
1generate(data: data_designer.engine.configurable_task.DataT) -> data_designer.engine.configurable_task.DataT

Sync generate — overridden by most concrete generators.

Default bridges to agenerate() for async-first subclasses that only implement agenerate(). Raises NotImplementedError if neither generate() nor agenerate() is overridden.

1agenerate(data: data_designer.engine.configurable_task.DataT) -> data_designer.engine.configurable_task.DataT

Async generate — delegates to sync generate() via thread pool.

Subclasses with native async support (e.g. ColumnGeneratorWithModelChatCompletion) should override this with a direct async implementation.

1log_pre_generation() -> None

A shared method to log info before the generator’s generate method is called.

The idea is for dataset builders to call this method for all generators before calling their generate method. This is to avoid logging the same information multiple times when running generators in parallel.

1class data_designer.engine.column_generators.generators.base.FromScratchColumnGenerator(
2 config: data_designer.engine.configurable_task.TaskConfigT,
3 resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
4)

Bases: data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT], abc.ABC

1can_generate_from_scratch: bool
1generate_from_scratch(num_records: int) -> pandas.DataFrame
1agenerate_from_scratch(num_records: int) -> pandas.DataFrame

Async wrapper — wraps sync generate_from_scratch() in a thread.

1class data_designer.engine.column_generators.generators.base.ColumnGeneratorWithModelRegistry(
2 config: data_designer.engine.configurable_task.TaskConfigT,
3 resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
4)

Bases: data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT], abc.ABC

1is_llm_bound: bool
1model_registry: data_designer.engine.models.registry.ModelRegistry
1get_model(model_alias: str) -> data_designer.engine.models.facade.ModelFacade
1get_model_config(model_alias: str) -> data_designer.config.models.ModelConfig
1get_model_provider_name(model_alias: str) -> str
1class data_designer.engine.column_generators.generators.base.ColumnGeneratorWithModel(
2 config: data_designer.engine.configurable_task.TaskConfigT,
3 resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
4)

Bases: data_designer.engine.column_generators.generators.base.ColumnGeneratorWithModelRegistry[data_designer.engine.configurable_task.TaskConfigT], abc.ABC

1model() -> data_designer.engine.models.facade.ModelFacade
1model_config() -> data_designer.config.models.ModelConfig
1inference_parameters() -> data_designer.config.models.BaseInferenceParams
1_build_multi_modal_context(record: dict) -> list[dict[str, typing.Any]] | None

Build multi-modal context from the config’s multi_modal_context list.

Passes base_path to get_contexts() so that generated image file paths (stored under base_dataset_path in create mode) can be resolved to base64 before being sent to the model endpoint.

Parameters:

record
dict

The deserialized record containing column values.

Returns:

list[dict[str, typing.Any]] | None

A list of multi-modal context dicts, or None if no context is configured.

1log_pre_generation() -> None
1class data_designer.engine.column_generators.generators.base.ColumnGeneratorCellByCell(
2 config: data_designer.engine.configurable_task.TaskConfigT,
3 resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
4)

Bases: data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT], abc.ABC

Base class for column generators invoked once per row.

Override generate to return the complete row mapping after adding the generated value. The engine calls the generator once per row and may run calls concurrently. Use this base when generation is independent per row (e.g. an LLM call per row, a per-row transform).

1get_generation_strategy() -> data_designer.config.column_configs.GenerationStrategy
1generate(data: dict) -> dict

Generate one row’s output from a single row’s upstream values.

Parameters:

data
dict

Current row mapping containing the upstream values available to this column.

Returns:

dict

Complete row mapping with existing keys preserved and the new column value added. Include declared side-effect columns when the config creates them.

1class data_designer.engine.column_generators.generators.base.ColumnGeneratorFullColumn(
2 config: data_designer.engine.configurable_task.TaskConfigT,
3 resource_provider: data_designer.engine.resources.resource_provider.ResourceProvider
4)

Bases: data_designer.engine.column_generators.generators.base.ColumnGenerator[data_designer.engine.configurable_task.TaskConfigT], abc.ABC

Base class for column generators that transform a full batch at once.

Override generate to return the complete batch DataFrame after adding generated values. Use this base when generation is vectorizable or when an external API accepts batched input more efficiently than per-row calls.

1get_generation_strategy() -> data_designer.config.column_configs.GenerationStrategy
1generate(data: pandas.DataFrame) -> pandas.DataFrame

Generate an entire batch of row outputs.

Parameters:

data
pandas.DataFrame

DataFrame containing the upstream columns this generator depends on.

Returns:

pandas.DataFrame

DataFrame containing the input columns plus the new column and any side-effect columns. When config.allow_resize is False, the row count must match the input; when it is True, the row count may change.