Workflow chaining is currently experimental and under active development. The documentation, examples, workflow API, metadata schema, and artifact layout are subject to significant changes in future releases. If you encounter any issues, have questions, or have ideas for improvement, please consider starting a discussion on GitHub.
Workflow chaining lets you split a dataset build into named stages. Each stage runs a normal DataDesigner.create() call, writes its own artifact directory, and hands a selected parquet output to the next stage as a LocalFileSeedSource.
Use it when one generation step naturally depends on the cleaned or reshaped output of another step, especially when a processor-only stage is clearer than mixing all transformations into one config.
A stage can expose different views of its data:
Processors added with config_builder.add_processor(...) run inside the stage and usually create side artifacts. They do not automatically change what the next stage receives. Use output_processors=[...] when a processor should define the stage boundary output.
Stages can be processor-only when they receive seed data from an upstream stage:
This is useful for final cleanup, schema transforms, and format-specific export preparation.
push_to_hub() does not support selected processor or callback outputs yet. Use export() for the selected workflow output.on_success callbacks are trusted user code. If a callback returns a path, Data Designer reads that path as the next stage input.