For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
  • Dev Notes
    • Overview
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
    • Data Designer Got Skills
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Quick Start
  • Normalized Field Compatibility
  • Notes
  • Example: Summarize a Random Turn
  • Example: Turn Tool Interactions into a Review Dataset
  • Related Guides
Concepts

Agent Rollout Ingestion

||View as Markdown|
Previous

Seed Datasets

Next

Default Model Settings

AgentRolloutSeedSource turns existing agent rollouts into a seed dataset for synthetic data workflows. It lets you operate locally on rollout artifacts you already have on disk, then normalizes them into rows you can filter, curate, and distill into training or evaluation data.

Quick Start

Use AgentRolloutSeedSource when you want to work from existing agent traces instead of traces captured during a Data Designer generation run.

Claude Code
Codex
Hermes Agent
Pi Coding Agent
ATIF

Uses ~/.claude/projects and *.jsonl by default.

1import data_designer.config as dd
2
3seed_source = dd.AgentRolloutSeedSource(
4 format=dd.AgentRolloutFormat.CLAUDE_CODE,
5)

You can override path and file_pattern for any format when your rollout artifacts live outside the built-in defaults.

Normalized Field Compatibility

All supported rollout formats map into the same seeded row schema. In the table below, None means the source artifact does not expose that field directly, and derived means Data Designer computes it from normalized messages.

Normalized fieldATIFClaude CodeCodexHermes AgentPi Coding Agent
trace_idsession_idsessionId[:agentId]session_meta.id or file stemCLI session_id or file stem; gateway file stemSession header id
source_kind"atif""claude_code""codex""hermes_agent""pi_coding_agent"
source_pathParsed .json pathParsed .jsonl trace pathParsed rollout-*.jsonl pathParsed CLI .json or gateway .jsonl pathParsed .jsonl session path
root_session_idsession_idsessionId or file stemtrace_idtrace_idSession header id
agent_idNoneagentIdNoneNoneNone
is_sidechainFalseisSidechainFalseFalseFalse
cwdagent.extra.cwdFirst non-null record cwdsession_meta.cwdNoneSession header cwd
project_pathextra.project_path or cwdprojectPath or cwdcwdNoneSession header cwd
git_branchagent.extra.git_branchFirst non-null record gitBranchsession_meta.git_branchNoneNone
started_atEarliest step timestampEarliest row timestampsession_meta.timestamp or earliest record timestampCLI session_start; gateway created_atEarliest entry timestamp
ended_atLatest step timestampLatest row timestampLatest record timestampCLI last_updated; gateway updated_atLatest entry timestamp
messagesNormalized stepsNormalized trace rowsNormalized response itemsNormalized CLI or gateway rowsNormalized active-path messages
source_metaATIF metadataClaude metadataCodex metadataHermes metadataPi session metadata
message_countderivedderivedderivedderivedderived
tool_call_countderivedderivedderivedderivedderived
final_assistant_messagederivedderivedderivedderivedderived

Notes

  • trace_id: Claude Code appends agentId when present. Hermes uses either the CLI session ID or the gateway transcript file stem. Pi uses the session header id.
  • is_sidechain: ATIF, Hermes, and Pi currently normalize this to False. Claude Code preserves isSidechain directly.
  • messages: All formats normalize into the same chat-style message schema. See Message Traces for the shared block structure. Pi sessions are tree-structured; only the active conversation path (from the last entry back to root) is included.
  • source_meta: This is where format-specific details live, such as ATIF copied-context metadata, Claude summaries, Codex response-item types, Hermes tool/session metadata, or Pi session version and branch information.

Example: Summarize a Random Turn

Because the seeded fields are normalized, you can also build lightweight summarization workflows directly from imported rollouts. This example samples one random normalized message from each trace and summarizes it in a single sentence.

1import data_designer.config as dd
2from data_designer.interface import DataDesigner
3
4data_designer = DataDesigner()
5config_builder = dd.DataDesignerConfigBuilder(
6 model_configs=[
7 dd.ModelConfig(
8 alias="trace-writer",
9 model="nvidia/nemotron-3-nano-30b-a3b",
10 provider="nvidia",
11 )
12 ]
13)
14
15config_builder.with_seed_dataset(
16 dd.AgentRolloutSeedSource(
17 format=dd.AgentRolloutFormat.CLAUDE_CODE,
18 )
19)
20
21config_builder.add_column(
22 dd.ExpressionColumnConfig(
23 name="sampled_turn",
24 expr="{{ messages | random }}",
25 )
26)
27
28config_builder.add_column(
29 dd.LLMTextColumnConfig(
30 name="turn_summary",
31 model_alias="trace-writer",
32 prompt="""\
33Summarize this randomly sampled rollout turn in one sentence.
34The turn may come from the user, assistant, or a tool result.
35
36Trace: {{ trace_id }}
37Turn:
38{{ sampled_turn }}
39""",
40 )
41)
42
43preview = data_designer.preview(config_builder, num_records=3)
44preview.display_sample_record()

This stays fully declarative: no custom seed reader or preprocessing step is required. Because sampled_turn is drawn from the normalized messages list, the same config works across all supported rollout formats.

Example: Turn Tool Interactions into a Review Dataset

You can also explode imported rollouts into a tool-interaction dataset. This example scans normalized messages, emits one row per tool call and matching tool response, preserves the trace context up to that response, and then uses a structured column to label the interaction as a success, failure, or unclear outcome.

1import data_designer.config as dd
2from data_designer.interface import DataDesigner
3from pydantic import BaseModel, Field
4from typing import Literal
5
6
7@dd.custom_column_generator(
8 required_columns=["messages"],
9 side_effect_columns=["tool_call", "tool_response", "tool_name"],
10)
11def explode_tool_interactions(row: dict) -> list[dict]:
12 rows = []
13 tool_calls_by_id = {}
14 context_messages = []
15
16 for message in row["messages"]:
17 context_messages.append(message)
18
19 for tool_call in message.get("tool_calls") or []:
20 tool_call_id = tool_call.get("id")
21 if tool_call_id:
22 tool_calls_by_id[tool_call_id] = tool_call
23
24 if message.get("role") != "tool":
25 continue
26
27 tool_call = tool_calls_by_id.get(
28 message.get("tool_call_id"),
29 {
30 "id": message.get("tool_call_id"),
31 "type": "function",
32 "function": {"name": "unknown", "arguments": "{}"},
33 },
34 )
35 tool_name = tool_call.get("function", {}).get("name", "unknown")
36
37 rows.append(
38 {
39 **row,
40 "tool_interaction_context": list(context_messages),
41 "tool_call": tool_call,
42 "tool_response": message,
43 "tool_name": tool_name,
44 }
45 )
46
47 return rows
48
49
50class ToolInteractionAnalysis(BaseModel):
51 outcome: Literal["success", "failure", "unclear"] = Field(
52 description="Whether the tool interaction appears to have succeeded, failed, or is ambiguous."
53 )
54 summary: str = Field(
55 description="One or two sentences summarizing what the tool was asked to do and what the response indicates."
56 )
57
58
59data_designer = DataDesigner()
60config_builder = dd.DataDesignerConfigBuilder(
61 model_configs=[
62 dd.ModelConfig(
63 alias="tool-analyst",
64 model="nvidia/nemotron-3-nano-30b-a3b",
65 provider="nvidia",
66 )
67 ]
68)
69
70config_builder.with_seed_dataset(
71 dd.AgentRolloutSeedSource(
72 format=dd.AgentRolloutFormat.CLAUDE_CODE,
73 )
74)
75
76config_builder.add_column(
77 dd.CustomColumnConfig(
78 name="tool_interaction_context",
79 generator_function=explode_tool_interactions,
80 allow_resize=True,
81 )
82)
83
84config_builder.add_column(
85 dd.LLMStructuredColumnConfig(
86 name="tool_interaction_analysis",
87 model_alias="tool-analyst",
88 output_format=ToolInteractionAnalysis,
89 prompt="""\
90You are analyzing one tool interaction from an imported agent rollout.
91
92Context up to the tool response:
93{{ tool_interaction_context }}
94
95Tool name: {{ tool_name }}
96
97Tool call:
98{{ tool_call }}
99
100Tool response:
101{{ tool_response }}
102
103Decide whether this interaction is a success, failure, or unclear outcome.
104Then summarize what the tool was asked to do and what happened.
105Base your answer on the tool call arguments, the tool response, and the immediate context.
106""",
107 )
108)
109
110preview = data_designer.preview(config_builder, num_records=5)
111preview.display_sample_record()

This pattern is useful when you want to curate evaluator or monitoring datasets from real traces. The resize-enabled custom column turns each tool interaction into its own record, and the structured column adds a consistent outcome label plus a grounded summary. Because the logic operates on normalized tool_calls and tool messages, the same pattern transfers across supported rollout formats. If your traces are long, consider adding a second custom or expression column that windows the context before sending it to a model.

Related Guides

  • For the general seed dataset model, see Seed Datasets.
  • For the normalized messages structure used in imported rollouts, see Message Traces.
  • For an end-to-end distillation example, see Agent Rollout Trace Distillation.