Agent Simulation
Use agent simulation to replay collected agent trajectories against mock Dynamo workers. Start with Agent Tracing to collect request rows, then convert those rows into an agentic Mooncake trace for python -m dynamo.replay.
Collect a Trace
Enable request tracing while running the agent workload:
For tool timing fidelity, publish explicit tool events over the optional ZMQ ingress described in Agent Tracing. Without tool events, Dynamo can still infer tool-wait time from the gap between adjacent LLM requests in the same session.
Convert to Agentic Mooncake
Experimental. The converter uses Dynamo request_end rows for request timing, token lengths, worker placement, and replay hashes. It also uses terminal harness tool rows (tool_end / tool_error) to preserve tool-wait time between dependent LLM requests.
Replay ignores non-replay request fields such as finish_reason_metadata; use the Perfetto view in Agent Tracing when you want to inspect final finish reasons, backend stop signals, or complete tool-call metadata inside the trace.
Replay Offline
The binary prints trace_block_size. Use that exact value for replay so hash segmentation matches what Dynamo recorded. Align the mock engine block size with the same number in --extra-engine-args.
kv_router needs at least two mock workers. For a single-worker smoke test, use --router-mode round_robin --num-workers 1.
Agentic Row Semantics
Agentic Mooncake rows preserve:
request_id: the LLM request row identity.- Mooncake
session_id: derived from the Dynamosession_id. wait_for: request IDs that must complete before this row becomes eligible.branches: child request IDs spawned from this row.prefix_reset: first request in a session.delay: non-tool delay after dependencies finish.tool_wait_ms: tool time after dependencies finish, parallel-aware as the union of overlapping spans rather than their sum.tool_events: per-tool spans attributed to this LLM request, each carryingtool_call_id,tool_class,status,started_at_unix_ms,ended_at_unix_ms,duration_ms, and optionaloutput_bytes,output_tokens, orerror_type.hash_ids,input_length, andoutput_length: prompt-prefix and length data for mocker replay.
Rows with no wait_for use their timestamp as the replay start time. Rows with dependencies wait for all listed requests to complete, then wait delay + tool_wait_ms before dispatch. For more flags and engine settings, see DynoSim Runs.