Agent Tracing

Attach trajectory identity and export Dynamo request and tool-event telemetry

View as Markdown

Agent tracing records who called (nvext.agent_context) and what Dynamo measured on each LLM request (request_end). Tool-call understanding is built in: Dynamo autodetects tool calls and finish reasons from the response stream and records them as finish_reason_metadata on every request — no harness instrumentation. Richer harness tool spans (tool_*: tool timing, status, output sizes) are an optional add-on. Context is passive—it does not steer routing or caching. Output is best-effort profiling data, not an audit log.

Flow: Harness sends chat completions with agent_context → Dynamo emits request_end (with autodetected finish_reason_metadata) to trace sinks. Optionally, a harness also publishes its own tool events over ZMQ → same sinks.

Adding trace context to each LLM call

Direct LLM call

Inject agent_context into each LLM request

1{
2 "model": "my-model",
3 "messages": [{ "role": "user", "content": "..." }],
4 "nvext": {
5 "agent_context": {
6 "session_type_id": "deep_research",
7 "session_id": "research-run-42",
8 "trajectory_id": "research-run-42:researcher",
9 "parent_trajectory_id": "research-run-42:planner"
10 }
11 }
12}
FieldRequiredMeaning
session_type_idYesWorkload class (e.g. deep_research).
session_idYesWhole agent run.
trajectory_idYesOne reasoning/tool chain inside the run.
parent_trajectory_idNoParent trajectory when using subagents.
trajectory_finalNotrue marks the trajectory’s last request — a cleanup hint.

trajectory_final is an optional terminal marker: set it to true to signal that a trajectory is finished. Lifecycle-aware backends use it to release whatever per-trajectory state they hold (scheduling bookkeeping, routing affinity, cached identity) right away instead of waiting for an idle timeout; backends that don’t track per-trajectory lifecycle ignore it.

Send it as a dedicated minimal request (e.g. max_tokens: 1 with a placeholder message), not piggybacked on a real turn. A reactive agent loop only learns a turn was terminal from its response, so the run’s end is typically known only after the last real turn already returned — there is no live turn left to flag. (A harness with a hard turn budget may know earlier, but early termination still leaves the real last turn unflagged, so a post-hoc close is the robust contract.) Because a backend that acts on the marker may skip generation entirely, the request body is just a carrier — keep it minimal.

No Dynamo imports are required in the harness — agent_context is plain JSON under nvext; just propagate it across threads/processes wherever those paths call the model.

Enable output

The fast path is one environment variable:

$export DYN_AGENT_TRACE=1

That picks jsonl_gz output at /tmp/dynamo-agent-trace.*.jsonl.gz. Tool-call understanding works immediately from request_end finish metadata — no harness tooling and no sockets (the optional ZMQ tool-event ingress is opt-in; see Tool call observability). Any of the per-knob variables below still wins when set explicitly, so you only need to reach for them to relocate output, add stderr, or tune buffers.

To relocate captures only:

$export DYN_AGENT_TRACE=1
$export DYN_AGENT_TRACE_OUTPUT_PATH=/mnt/captures/run-42
All agent trace environment variables
VariableRequiredDefault (when DYN_AGENT_TRACE=1)Notes
DYN_AGENT_TRACEMaster switchunsetTruthy (1, true, on, yes) enables tracing with all defaults below.
DYN_AGENT_TRACE_SINKSNojsonl_gzjsonl, jsonl_gz, stderr, or comma-separated (e.g. jsonl_gz,stderr).
DYN_AGENT_TRACE_OUTPUT_PATHNo/tmp/dynamo-agent-traceFile path for jsonl; segment prefix for jsonl_gzprefix.NNNNNN.jsonl.gz.
DYN_AGENT_TRACE_CAPACITYNo1024Trace bus capacity.
DYN_AGENT_TRACE_JSONL_BUFFER_BYTESNo1048576Buffer / gzip batch threshold.
DYN_AGENT_TRACE_JSONL_FLUSH_INTERVAL_MSNo1000Flush interval.
DYN_AGENT_TRACE_JSONL_GZ_ROLL_BYTESNo268435456Roll gzip segment by uncompressed bytes.
DYN_AGENT_TRACE_JSONL_GZ_ROLL_LINESNounsetOptional roll by line count.
DYN_AGENT_TRACE_REPLAY_HASHESNoonFalsey (0, no, …) disables replay hashes on requests.
DYN_AGENT_TRACE_TOOL_EVENTS_ZMQ_ENDPOINTNounset (opt-in)Set a PULL bind address (e.g. tcp://127.0.0.1:20390) to enable tool-event ingress.
DYN_AGENT_TRACE_TOOL_EVENTS_ZMQ_TOPICNounsetIf set, first ZMQ frame must match.

Without DYN_AGENT_TRACE=1, tracing is off; the other variables only take effect once the master switch is on.

Dynamo request_end record

Emitted after the response stream finishes or is dropped. Carries agent_context, output_tokens, and the autodetected finish_reason_metadata (tool-call names + finish reasons). request_id correlates with audit rows; the replay block feeds Mooncake replay (disable with DYN_AGENT_TRACE_REPLAY_HASHES=0). Tool-call metadata is ids and names only — arguments are intentionally not stored.

Full request_end record
1{
2 "schema": "dynamo.agent.trace.v1",
3 "event_type": "request_end",
4 "event_time_unix_ms": 1777312801000,
5 "event_source": "dynamo",
6 "agent_context": {
7 "session_type_id": "deep_research",
8 "session_id": "research-run-42",
9 "trajectory_id": "research-run-42:researcher",
10 "parent_trajectory_id": "research-run-42:planner"
11 },
12 "request": {
13 "request_id": "dynamo-request-id",
14 "model": "my-model",
15 "output_tokens": 16,
16 "finish_reason_metadata": {
17 "finish_reason": "tool_calls",
18 "backend_finish_reason": "stop",
19 "stop_reason": "END",
20 "tool_calls": [
21 {
22 "choice_index": 0,
23 "tool_call_index": 0,
24 "id": "call-abc",
25 "name": "web_search"
26 }
27 ],
28 "choices": [
29 {
30 "choice_index": 0,
31 "finish_reason": "tool_calls",
32 "backend_finish_reason": "stop",
33 "stop_reason": "END"
34 }
35 ]
36 },
37 "replay": {
38 "trace_block_size": 64,
39 "input_length": 128,
40 "input_sequence_hashes": [14879255164371896291, 274632075616497421]
41 }
42 }
43}

finish_reason_metadata is optional. finish_reason is the final OpenAI-compatible reason after parser rewrites (e.g. tool_calls); backend_finish_reason / stop_reason come from the backend stop path. Top-level finish fields summarize the single-choice case; choices keeps per-choice finish fields when n > 1. For chat streams, finish metadata is recorded after parser/jail rewrites; completion streams record the final OpenAI-compatible completion finish reason. See AgentTraceRecord / AgentRequestMetrics in lib/llm/src/agents/trace/types.rs for the full Rust schema.

Tool call observability

Default — autodetected, no harness work. Dynamo parses each response stream and records the tool calls the model made into request_end.finish_reason_metadata: the per-turn finish_reason and each call’s name and id (arguments are never stored). Active whenever DYN_AGENT_TRACE=1 and the worker runs a tool-call parser (--dyn-tool-call-parser …). This tells you what the agent called and when each turn ended.

You can also recover tool-wait time offline, without any tool events. Within a trajectory the agent is sequential, so the gap between one turn finishing and the next arriving is the tool + agent-overhead time:

tool_wait(turn N) ~= next.request_received_ms - this.event_time_unix_ms

request_received_ms is stamped at the frontend before the request enters the router queue/pause, so server wait time lands in each request’s own duration, not in the inter-turn gap — the estimate holds under load. For agentic replay that gap is the inter-request delay you would inject, so autodetect alone reproduces realistic arrival timing. It cannot split tool execution from agent overhead (you get the sum, as the wall-clock union of any parallel calls).

Optional — explicit tool events (ZMQ)

Opt-in: set DYN_AGENT_TRACE_TOOL_EVENTS_ZMQ_ENDPOINT to bind the ingress, and have the harness publish. Use it only when you need what autodetection and the timing gap can’t give: the attribution of tool time, per-tool duration_ms, status (succeeded/error/cancelled), and output sizes. Nothing emits tool events on its own.

Wire format: [topic, seq_be_u64, msgpack(AgentTraceRecord)]. Use a background publisher, bounded queue, monotonic sequence, and PUSH with HWM. Terminal tool_end / tool_error rows should carry timing (started_at_unix_ms, ended_at_unix_ms, duration_ms) even if tool_start was dropped. Same agent_context as the surrounding LLM calls; tool_call_id unique per trajectory. Join offline on session_id, trajectory_id, tool_call_id.

Example tool_end:

1{
2 "schema": "dynamo.agent.trace.v1",
3 "event_type": "tool_end",
4 "event_time_unix_ms": 1777312801500,
5 "event_source": "harness",
6 "agent_context": {
7 "session_type_id": "deep_research",
8 "session_id": "research-run-42",
9 "trajectory_id": "research-run-42:researcher"
10 },
11 "tool": {
12 "tool_call_id": "call-abc",
13 "tool_class": "web_search",
14 "status": "succeeded",
15 "started_at_unix_ms": 1777312801080,
16 "ended_at_unix_ms": 1777312801500,
17 "duration_ms": 420.5
18 }
19}

Optional tool keys: output_tokens, output_bytes, tool_name_hash, error_type. Status values: running, succeeded, error, cancelled; synonyms ok/success, failed, timeout/canceled also deserialize.

By default we do not save the input/ouput payloads. In order to view these, use the built in Dynamo audit_sink functionality.

Audit side-by-side (same gzip/jsonl machinery):

$# enable agent trace sinks
$export DYN_AGENT_TRACE_SINKS=jsonl_gz
$export DYN_AGENT_TRACE_OUTPUT_PATH=/tmp/dynamo-trace
$# enable audit sinks
$export DYN_AUDIT_SINKS=jsonl_gz
$export DYN_AUDIT_OUTPUT_PATH=/tmp/dynamo-audit
$export DYN_AUDIT_FORCE_LOGGING=true

After the run, correlate by id:

$gzip -cd /tmp/dynamo-audit.*.jsonl.gz | jq -c '.event' > /tmp/audit.jsonl
$gzip -cd /tmp/dynamo-trace.*.jsonl.gz | jq -c '.event' > /tmp/trace.jsonl
$jq -s 'group_by(.request_id // .request.request_id)' /tmp/audit.jsonl /tmp/trace.jsonl

The result is a JSONL file where each line wraps the record:

1{
2 "timestamp": 1234,
3 "event": { "schema": "dynamo.agent.trace.v1", "...": "..." }
4}

timestamp is sink-relative elapsed ms; use event.event_time_unix_ms for wall-clock ordering.

Viewing traces in Perfetto

In order to visualize and optimize your agentic graph, we provide a utility to convert the agent trace JSONL files into a Perfetto trace file. We have found this to be extremely useful to pipeline agents that our team writes!

$uv run --no-project python benchmarks/agent_trace/convert_to_perfetto.py \
> "${DYN_AGENT_TRACE_OUTPUT_PATH}".*.jsonl.gz \
> --output "${DYN_AGENT_TRACE_OUTPUT_PATH}.perfetto.json"

Open in Perfetto UI. Flags: --include-markers, --no-stages, --separate-stage-tracks.

Request slices include flattened finish metadata when present, such as finish.finish_reason, finish.backend_finish_reason, finish.stop_reason, finish.tool_call_count, finish.tool_call_names, and per-choice summaries like finish.choice_finish_reasons.

[Experimental] Replaying agent traces using agentic Mooncake replay

You can convert a collected agent trace into an agentic Mooncake trace and replay it with python -m dynamo.replay. The converter uses Dynamo request_end rows for request timing, token lengths, worker placement, and replay hashes. It also uses terminal harness tool rows (tool_end / tool_error) to preserve tool-wait time between dependent LLM requests.

Replay ignores non-replay request fields such as finish_reason_metadata; use the Perfetto view above when you want to inspect final finish reasons, backend stop signals, or complete tool-call metadata inside the trace.

$cargo run -p dynamo-bench --bin agent_trace_to_mooncake -- \
> --agentic \
> --input-path "${DYN_AGENT_TRACE_OUTPUT_PATH}".*.jsonl.gz \
> --output-file /tmp/dynamo-agent-trace.agentic-mooncake.jsonl

The binary prints trace_block_size. Use that exact value for replay so hash segmentation matches what Dynamo recorded. Align the mock engine block size with the same number in --extra-engine-args.

$TRACE_BLOCK_SIZE=128
$uv run --no-sync python -m dynamo.replay /tmp/dynamo-agent-trace.agentic-mooncake.jsonl \
> --trace-format agentic_mooncake \
> --trace-block-size "${TRACE_BLOCK_SIZE}" \
> --replay-mode offline \
> --router-mode kv_router \
> --num-workers 4 \
> --extra-engine-args "{\"block_size\":${TRACE_BLOCK_SIZE}}" \
> --report-json /tmp/dynamo-agent-trace.replay-report.json

kv_router needs at least two mock workers; for a single-worker smoke test use --router-mode round_robin --num-workers 1.

Agentic Mooncake rows preserve:

  • request_id: the LLM request row identity.
  • session_id: the Dynamo trajectory_id.
  • wait_for: request ids that must complete before this row becomes eligible.
  • branches: child request ids spawned from this row.
  • prefix_reset: first request in a trajectory.
  • delay: non-tool delay after dependencies finish.
  • tool_wait_ms: tool time after dependencies finish, parallel-aware (the union of overlapping spans rather than their sum).
  • tool_events: per-tool spans attributed to this LLM request, each carrying tool_call_id, tool_class, status, started_at_unix_ms, ended_at_unix_ms, duration_ms, and optional output_bytes / output_tokens / error_type.
  • hash_ids, input_length, and output_length: prompt-prefix and length data for mocker replay.

Rows with no wait_for use their timestamp as the replay start time. Rows with dependencies wait for all listed requests to complete, then wait delay + tool_wait_ms before dispatch. For more flags and engine settings, see DynoSim Runs.

ATIF alignment

Dynamo emits dynamo.agent.trace.v1, not full ATIF logs—but identifiers match ATIF / Harbor so you can join harness trajectories to Dynamo rows on session_id + trajectory_id. Dynamo omits conversational payload by design.

DynamoRole
session_idShared run id
trajectory_idBranch within run
parent_trajectory_idSubagent link
session_type_idProfile / workload type