data_designer.engine.resources.seed_reader
data_designer.engine.resources.seed_reader
data_designer.engine.resources.seed_reader
logger
SourceT
FileSystemSourceT
Bases: data_designer.errors.DataDesignerError
Filesystem and root path available to filesystem seed-reader plugins.
Bases: typing.Protocol
Batch object returned by seed readers and convertible to a DataFrame.
Bases: typing.Protocol
Reader that yields seed batches until exhausted.
Seed-reader batch backed by an in-memory pandas DataFrame.
Return the batch as a pandas DataFrame.
Create a DataFrame and verify hydrated records match the declared output schema.
Bases: abc.ABC, typing.Generic[data_designer.engine.resources.seed_reader.SourceT]
Base class for reading a seed dataset.
Seeds are read using duckdb. Reader implementations define duckdb connection setup details and how to get a URI that can be queried with duckdb (i.e. ”… FROM <uri> …”).
The Data Designer engine automatically supplies the appropriate SeedSource
and a SecretResolver to use for any secret fields in the config via
attach(...). Subclasses that need per-attachment setup can override
on_attach(...) without needing to call super().
Attach a source and secret resolver to the instance.
This is called internally by the engine so that these objects do not need to be provided in the reader’s constructor.
Hook for subclasses that need per-attachment setup.
Create a rooted filesystem context for directory-backed seed readers.
Returns the seed dataset’s column names
Return the seed_type of the source class this reader is generic over.
Bases: data_designer.engine.resources.seed_reader.SeedReader[data_designer.config.seed_source.LocalFileSeedSource]
Bases: data_designer.engine.resources.seed_reader.SeedReader[data_designer.config.seed_source.HuggingFaceSeedSource]
Bases: data_designer.engine.resources.seed_reader.SeedReader[data_designer.config.seed_source_dataframe.DataFrameSeedSource]
Bases: data_designer.engine.resources.seed_reader.SeedReader[data_designer.engine.resources.seed_reader.FileSystemSourceT], abc.ABC
Base class for filesystem-derived seed readers.
Plugin authors implement build_manifest(...) to describe the cheap logical
rows available under the configured filesystem root. Readers that need
expensive enrichment can optionally override hydrate_row(...) to emit one
record dict or an iterable of record dicts per manifest row. When emitted
records change the manifest schema, output_columns must declare the exact
hydrated output schema for each emitted record. The framework owns
attachment-scoped filesystem context reuse, manifest sampling, partitioning,
randomization, batching, and DuckDB registration details.
Bases: data_designer.engine.resources.seed_reader.FileSystemSeedReader[data_designer.config.seed_source.DirectorySeedSource]
Bases: data_designer.engine.resources.seed_reader.FileSystemSeedReader[data_designer.config.seed_source.FileContentsSeedSource]
Bases: data_designer.engine.resources.seed_reader.FileSystemSeedReader[data_designer.config.seed_source.AgentRolloutSeedSource]