Agent Tracing | NVIDIA Dynamo Documentation

Agent tracing records who called (nvext.agent_context) and what Dynamo measured on each LLM request (request_end). Tool-call understanding is built in: Dynamo autodetects tool calls and finish reasons from the response stream and records them as finish_reason_metadata on every request — no harness instrumentation. Richer harness tool spans (tool_*: tool timing, status, output sizes) are an optional add-on. Context is passive—it does not steer routing or caching. Output is best-effort profiling data, not an audit log.

Flow: Harness sends chat completions with agent_context → Dynamo emits request_end (with autodetected finish_reason_metadata) to trace sinks. Optionally, a harness also publishes its own tool events over ZMQ → same sinks.

Adding trace context to each LLM call

Direct LLM call

Inject agent_context into each LLM request

1 {
2   "model": "my-model",
3   "messages": [{ "role": "user", "content": "..." }],
4   "nvext": {
5     "agent_context": {
6       "session_type_id": "deep_research",
7       "session_id": "research-run-42",
8       "trajectory_id": "research-run-42:researcher",
9       "parent_trajectory_id": "research-run-42:planner"
10     }
11   }
12 }

Field	Required	Meaning
`session_type_id`	Yes	Workload class (e.g. `deep_research`).
`session_id`	Yes	Whole agent run.
`trajectory_id`	Yes	One reasoning/tool chain inside the run.
`parent_trajectory_id`	No	Parent trajectory when using subagents.
`trajectory_final`	No	`true` marks the trajectory’s last request — a cleanup hint.

trajectory_final is an optional terminal marker: set it to true to signal that a trajectory is finished. Lifecycle-aware backends use it to release whatever per-trajectory state they hold (scheduling bookkeeping, routing affinity, cached identity) right away instead of waiting for an idle timeout; backends that don’t track per-trajectory lifecycle ignore it.

Send it as a dedicated minimal request (e.g. max_tokens: 1 with a placeholder message), not piggybacked on a real turn. A reactive agent loop only learns a turn was terminal from its response, so the run’s end is typically known only after the last real turn already returned — there is no live turn left to flag. (A harness with a hard turn budget may know earlier, but early termination still leaves the real last turn unflagged, so a post-hoc close is the robust contract.) Because a backend that acts on the marker may skip generation entirely, the request body is just a carrier — keep it minimal.

No Dynamo imports are required in the harness — agent_context is plain JSON under nvext; just propagate it across threads/processes wherever those paths call the model.

Enable output

The fast path is one environment variable:

$ export DYN_AGENT_TRACE=1

That picks jsonl_gz output at /tmp/dynamo-agent-trace.*.jsonl.gz. Tool-call understanding works immediately from request_end finish metadata — no harness tooling and no sockets (the optional ZMQ tool-event ingress is opt-in; see Tool call observability). Any of the per-knob variables below still wins when set explicitly, so you only need to reach for them to relocate output, add stderr, or tune buffers.

To relocate captures only:

$ export DYN_AGENT_TRACE=1
$ export DYN_AGENT_TRACE_OUTPUT_PATH=/mnt/captures/run-42

All agent trace environment variables

Variable	Required	Default (when `DYN_AGENT_TRACE=1`)	Notes
`DYN_AGENT_TRACE`	Master switch	unset	Truthy (`1`, `true`, `on`, `yes`) enables tracing with all defaults below.
`DYN_AGENT_TRACE_SINKS`	No	`jsonl_gz`	`jsonl`, `jsonl_gz`, `stderr`, or comma-separated (e.g. `jsonl_gz,stderr`).
`DYN_AGENT_TRACE_OUTPUT_PATH`	No	`/tmp/dynamo-agent-trace`	File path for `jsonl`; segment prefix for `jsonl_gz` → `prefix.NNNNNN.jsonl.gz`.
`DYN_AGENT_TRACE_CAPACITY`	No	`1024`	Trace bus capacity.
`DYN_AGENT_TRACE_JSONL_BUFFER_BYTES`	No	`1048576`	Buffer / gzip batch threshold.
`DYN_AGENT_TRACE_JSONL_FLUSH_INTERVAL_MS`	No	`1000`	Flush interval.
`DYN_AGENT_TRACE_JSONL_GZ_ROLL_BYTES`	No	`268435456`	Roll gzip segment by uncompressed bytes.
`DYN_AGENT_TRACE_JSONL_GZ_ROLL_LINES`	No	unset	Optional roll by line count.
`DYN_AGENT_TRACE_REPLAY_HASHES`	No	on	Falsey (`0`, `no`, …) disables `replay` hashes on requests.
`DYN_AGENT_TRACE_TOOL_EVENTS_ZMQ_ENDPOINT`	No	unset (opt-in)	Set a PULL bind address (e.g. `tcp://127.0.0.1:20390`) to enable tool-event ingress.
`DYN_AGENT_TRACE_TOOL_EVENTS_ZMQ_TOPIC`	No	unset	If set, first ZMQ frame must match.

Without DYN_AGENT_TRACE=1, tracing is off; the other variables only take effect once the master switch is on.

Dynamo `request_end` record

Emitted after the response stream finishes or is dropped. Carries agent_context, output_tokens, and the autodetected finish_reason_metadata (tool-call names + finish reasons). request_id correlates with audit rows; the replay block feeds Mooncake replay (disable with DYN_AGENT_TRACE_REPLAY_HASHES=0). Tool-call metadata is ids and names only — arguments are intentionally not stored.

Full request_end record

1 {
2   "schema": "dynamo.agent.trace.v1",
3   "event_type": "request_end",
4   "event_time_unix_ms": 1777312801000,
5   "event_source": "dynamo",
6   "agent_context": {
7     "session_type_id": "deep_research",
8     "session_id": "research-run-42",
9     "trajectory_id": "research-run-42:researcher",
10     "parent_trajectory_id": "research-run-42:planner"
11   },
12   "request": {
13     "request_id": "dynamo-request-id",
14     "model": "my-model",
15     "output_tokens": 16,
16     "finish_reason_metadata": {
17       "finish_reason": "tool_calls",
18       "backend_finish_reason": "stop",
19       "stop_reason": "END",
20       "tool_calls": [
21         {
22           "choice_index": 0,
23           "tool_call_index": 0,
24           "id": "call-abc",
25           "name": "web_search"
26         }
27       ],
28       "choices": [
29         {
30           "choice_index": 0,
31           "finish_reason": "tool_calls",
32           "backend_finish_reason": "stop",
33           "stop_reason": "END"
34         }
35       ]
36     },
37     "replay": {
38       "trace_block_size": 64,
39       "input_length": 128,
40       "input_sequence_hashes": [14879255164371896291, 274632075616497421]
41     }
42   }
43 }

finish_reason_metadata is optional. finish_reason is the final OpenAI-compatible reason after parser rewrites (e.g. tool_calls); backend_finish_reason / stop_reason come from the backend stop path. Top-level finish fields summarize the single-choice case; choices keeps per-choice finish fields when n > 1. For chat streams, finish metadata is recorded after parser/jail rewrites; completion streams record the final OpenAI-compatible completion finish reason. See AgentTraceRecord / AgentRequestMetrics in lib/llm/src/agents/trace/types.rs for the full Rust schema.

Tool call observability

Default — autodetected, no harness work. Dynamo parses each response stream and records the tool calls the model made into request_end.finish_reason_metadata: the per-turn finish_reason and each call’s name and id (arguments are never stored). Active whenever DYN_AGENT_TRACE=1 and the worker runs a tool-call parser (--dyn-tool-call-parser …). This tells you what the agent called and when each turn ended.

You can also recover tool-wait time offline, without any tool events. Within a trajectory the agent is sequential, so the gap between one turn finishing and the next arriving is the tool + agent-overhead time:

tool_wait(turn N) ~= next.request_received_ms - this.event_time_unix_ms

request_received_ms is stamped at the frontend before the request enters the router queue/pause, so server wait time lands in each request’s own duration, not in the inter-turn gap — the estimate holds under load. For agentic replay that gap is the inter-request delay you would inject, so autodetect alone reproduces realistic arrival timing. It cannot split tool execution from agent overhead (you get the sum, as the wall-clock union of any parallel calls).

Optional — explicit tool events (ZMQ)

Opt-in: set DYN_AGENT_TRACE_TOOL_EVENTS_ZMQ_ENDPOINT to bind the ingress, and have the harness publish. Use it only when you need what autodetection and the timing gap can’t give: the attribution of tool time, per-tool duration_ms, status (succeeded/error/cancelled), and output sizes. Nothing emits tool events on its own.

Wire format: [topic, seq_be_u64, msgpack(AgentTraceRecord)]. Use a background publisher, bounded queue, monotonic sequence, and PUSH with HWM. Terminal tool_end / tool_error rows should carry timing (started_at_unix_ms, ended_at_unix_ms, duration_ms) even if tool_start was dropped. Same agent_context as the surrounding LLM calls; tool_call_id unique per trajectory. Join offline on session_id, trajectory_id, tool_call_id.

Example tool_end:

1 {
2   "schema": "dynamo.agent.trace.v1",
3   "event_type": "tool_end",
4   "event_time_unix_ms": 1777312801500,
5   "event_source": "harness",
6   "agent_context": {
7     "session_type_id": "deep_research",
8     "session_id": "research-run-42",
9     "trajectory_id": "research-run-42:researcher"
10   },
11   "tool": {
12     "tool_call_id": "call-abc",
13     "tool_class": "web_search",
14     "status": "succeeded",
15     "started_at_unix_ms": 1777312801080,
16     "ended_at_unix_ms": 1777312801500,
17     "duration_ms": 420.5
18   }
19 }

Optional tool keys: output_tokens, output_bytes, tool_name_hash, error_type. Status values: running, succeeded, error, cancelled; synonyms ok/success, failed, timeout/canceled also deserialize.

By default we do not save the input/ouput payloads. In order to view these, use the built in Dynamo audit_sink functionality.

Audit side-by-side (same gzip/jsonl machinery):

$ # enable agent trace sinks
$ export DYN_AGENT_TRACE_SINKS=jsonl_gz
$ export DYN_AGENT_TRACE_OUTPUT_PATH=/tmp/dynamo-trace
$ # enable audit sinks
$ export DYN_AUDIT_SINKS=jsonl_gz
$ export DYN_AUDIT_OUTPUT_PATH=/tmp/dynamo-audit
$ export DYN_AUDIT_FORCE_LOGGING=true

After the run, correlate by id:

$ gzip -cd /tmp/dynamo-audit.*.jsonl.gz | jq -c '.event' > /tmp/audit.jsonl
$ gzip -cd /tmp/dynamo-trace.*.jsonl.gz | jq -c '.event' > /tmp/trace.jsonl
$ jq -s 'group_by(.request_id // .request.request_id)' /tmp/audit.jsonl /tmp/trace.jsonl

The result is a JSONL file where each line wraps the record:

1 {
2   "timestamp": 1234,
3   "event": { "schema": "dynamo.agent.trace.v1", "...": "..." }
4 }

timestamp is sink-relative elapsed ms; use event.event_time_unix_ms for wall-clock ordering.

Viewing traces in Perfetto

In order to visualize and optimize your agentic graph, we provide a utility to convert the agent trace JSONL files into a Perfetto trace file. We have found this to be extremely useful to pipeline agents that our team writes!

$ uv run --no-project python benchmarks/agent_trace/convert_to_perfetto.py \
>   "${DYN_AGENT_TRACE_OUTPUT_PATH}".*.jsonl.gz \
>   --output "${DYN_AGENT_TRACE_OUTPUT_PATH}.perfetto.json"

Open in Perfetto UI. Flags: --include-markers, --no-stages, --separate-stage-tracks.

Request slices include flattened finish metadata when present, such as finish.finish_reason, finish.backend_finish_reason, finish.stop_reason, finish.tool_call_count, finish.tool_call_names, and per-choice summaries like finish.choice_finish_reasons.

[Experimental] Replaying agent traces using agentic Mooncake replay

You can convert a collected agent trace into an agentic Mooncake trace and replay it with python -m dynamo.replay. The converter uses Dynamo request_end rows for request timing, token lengths, worker placement, and replay hashes. It also uses terminal harness tool rows (tool_end / tool_error) to preserve tool-wait time between dependent LLM requests.

Replay ignores non-replay request fields such as finish_reason_metadata; use the Perfetto view above when you want to inspect final finish reasons, backend stop signals, or complete tool-call metadata inside the trace.

$ cargo run -p dynamo-bench --bin agent_trace_to_mooncake -- \
>   --agentic \
>   --input-path "${DYN_AGENT_TRACE_OUTPUT_PATH}".*.jsonl.gz \
>   --output-file /tmp/dynamo-agent-trace.agentic-mooncake.jsonl

The binary prints trace_block_size. Use that exact value for replay so hash segmentation matches what Dynamo recorded. Align the mock engine block size with the same number in --extra-engine-args.

$ TRACE_BLOCK_SIZE=128
$ uv run --no-sync python -m dynamo.replay /tmp/dynamo-agent-trace.agentic-mooncake.jsonl \
>   --trace-format agentic_mooncake \
>   --trace-block-size "${TRACE_BLOCK_SIZE}" \
>   --replay-mode offline \
>   --router-mode kv_router \
>   --num-workers 4 \
>   --extra-engine-args "{\"block_size\":${TRACE_BLOCK_SIZE}}" \
>   --report-json /tmp/dynamo-agent-trace.replay-report.json

kv_router needs at least two mock workers; for a single-worker smoke test use --router-mode round_robin --num-workers 1.

Agentic Mooncake rows preserve:

request_id: the LLM request row identity.
session_id: the Dynamo trajectory_id.
wait_for: request ids that must complete before this row becomes eligible.
branches: child request ids spawned from this row.
prefix_reset: first request in a trajectory.
delay: non-tool delay after dependencies finish.
tool_wait_ms: tool time after dependencies finish, parallel-aware (the union of overlapping spans rather than their sum).
tool_events: per-tool spans attributed to this LLM request, each carrying tool_call_id, tool_class, status, started_at_unix_ms, ended_at_unix_ms, duration_ms, and optional output_bytes / output_tokens / error_type.
hash_ids, input_length, and output_length: prompt-prefix and length data for mocker replay.

Rows with no wait_for use their timestamp as the replay start time. Rows with dependencies wait for all listed requests to complete, then wait delay + tool_wait_ms before dispatch. For more flags and engine settings, see DynoSim Runs.

ATIF alignment

Dynamo emits dynamo.agent.trace.v1, not full ATIF logs—but identifiers match ATIF / Harbor so you can join harness trajectories to Dynamo rows on session_id + trajectory_id. Dynamo omits conversational payload by design.

Dynamo	Role
`session_id`	Shared run id
`trajectory_id`	Branch within run
`parent_trajectory_id`	Subagent link
`session_type_id`	Profile / workload type