Custom columns let you implement your own generation logic using Python functions. Use them for multi-step LLM workflows, external API integration, or any scenario requiring full programmatic control. For reusable, distributable components, see Plugins instead.
Three signatures are supported. Parameter names are validated:
For full_column strategy, use df instead of row.
For LLM access without params, use generator_params: None:
Model aliases are validated before generation starts. If an alias doesn’t exist in your config, an error is raised during the health check.
Recommendation: Use cell_by_cell for LLM calls. The framework handles parallelization automatically. Use full_column only for vectorized operations that don’t involve LLM calls.
For full_column, set generation_strategy=dd.GenerationStrategy.FULL_COLUMN.
Sync cell_by_cell generators are dispatched concurrently across rows under the async engine. Module-level mutable state (counters, caches, non-thread-safe HTTP clients) needs synchronization or per-row instantiation. For network-bound work, prefer async def fn(row) — the engine runs it directly on its event loop and skips the thread bridge.
The third argument is a dict of ModelFacade instances, keyed by alias. You must declare all models required in your custom column generator in model_aliases - this populates the models dict and enables health checks before generation starts.
This gives you direct access to all ModelFacade capabilities: custom parsers, correction loops, structured output, tool use, etc.
FULL_COLUMN: Set allow_resize=True and return a DataFrame with more or fewer rows than the input:
CELL_BY_CELL: With allow_resize=True, your function may return a single row (dict) or multiple rows (list[dict]). Return [] to drop that input row.
Use cases:
[] per row (CELL_BY_CELL)Test generators with real LLM calls without running the full pipeline:
In unit tests that mock model clients, use MagicMock(spec=ModelFacade) so async methods are auto-detected:
Mocking only generate() will silently no-op under the async engine because the bridge routes through agenerate().