AgentRolloutSeedSource turns existing agent rollouts into a seed dataset for synthetic data workflows. It lets you operate locally on rollout artifacts you already have on disk, then normalizes them into rows you can filter, curate, and distill into training or evaluation data.
Use AgentRolloutSeedSource when you want to work from existing agent traces instead of traces captured during a Data Designer generation run.
Uses ~/.claude/projects and *.jsonl by default.
You can override path and file_pattern for any format when your rollout artifacts live outside the built-in defaults.
All supported rollout formats map into the same seeded row schema. In the table below, None means the source artifact does not expose that field directly, and derived means Data Designer computes it from normalized messages.
trace_id: Claude Code appends agentId when present. Hermes uses either the CLI session ID or the gateway transcript file stem. Pi uses the session header id.is_sidechain: ATIF, Hermes, and Pi currently normalize this to False. Claude Code preserves isSidechain directly.messages: All formats normalize into the same chat-style message schema. See Message Traces for the shared block structure. Pi sessions are tree-structured; only the active conversation path (from the last entry back to root) is included.source_meta: This is where format-specific details live, such as ATIF copied-context metadata, Claude summaries, Codex response-item types, Hermes tool/session metadata, or Pi session version and branch information.Because the seeded fields are normalized, you can also build lightweight summarization workflows directly from imported rollouts. This example samples one random normalized message from each trace and summarizes it in a single sentence.
This stays fully declarative: no custom seed reader or preprocessing step is required. Because sampled_turn is drawn from the normalized messages list, the same config works across all supported rollout formats.
You can also explode imported rollouts into a tool-interaction dataset. This example scans normalized messages, emits one row per tool call and matching tool response, preserves the trace context up to that response, and then uses a structured column to label the interaction as a success, failure, or unclear outcome.
This pattern is useful when you want to curate evaluator or monitoring datasets from real traces. The resize-enabled custom column turns each tool interaction into its own record, and the structured column adds a consistent outcome label plus a grounded summary. Because the logic operates on normalized tool_calls and tool messages, the same pattern transfers across supported rollout formats. If your traces are long, consider adding a second custom or expression column that windows the context before sending it to a model.
messages structure used in imported rollouts, see Message Traces.