Agent Context and Tracing
Dynamo supports passive agent request tracing. An agent harness can attach
identity metadata to each LLM request, and Dynamo can write normalized
request_end records to configured trace sinks.
This is observability only. It does not change routing, scheduling, or cache behavior.
Request Metadata
Set nvext.agent_context on chat completion requests:
For per-call correlation, set the HTTP x-request-id header to the harness LLM
call ID:
x-request-id is not Dynamo’s internal inference request ID. It is copied into
the trace record as request.x_request_id.
Enabling Trace Output
Set DYN_AGENT_TRACE_SINKS before starting Dynamo. Use jsonl for local
trace files, jsonl_gz for rotating compressed trace segments, stderr for
development logging, or a comma-separated list:
Minimum setup for rotating compressed traces:
The jsonl sink writes one recorder JSON object per line:
{"timestamp": <elapsed_ms>, "event": <normalized trace event>}. The
jsonl_gz sink writes the same JSONL records into numbered compressed segments
derived from DYN_AGENT_TRACE_OUTPUT_PATH, such as
/tmp/dynamo-agent-trace.000000.jsonl.gz and
/tmp/dynamo-agent-trace.000001.jsonl.gz. Each flush appends a complete gzip
member, so standard gzip tools can read the concatenated stream. The stderr
sink logs the normalized trace event as a structured agent_trace log record.
All sinks are best-effort telemetry for debugging and offline profiling. They
are not durable audit logs.
ms-agent End-to-End Smoke
To see this in action, use a fork of the ModelScope ms-agent DeepResearch
agent framework with Dynamo trace hooks. Until those hooks land upstream, this
branch injects nvext.agent_context and x-request-id on LLM requests:
Start Dynamo with a local compressed trace sink:
Run ms-agent against Dynamo. Set a stable workflow ID if you want to grep or query one smoke run:
Read the resulting compressed trace records:
Expected records should contain event.event_type = "request_end",
event.agent_context.workflow_id matching DYNAMO_AGENT_WORKFLOW_ID, the
caller x_request_id, token counts, TTFT, average ITL, cache metrics, queue
depth, and worker IDs when available.
Perfetto Timeline Conversion
Convert Dynamo agent trace shards to Chrome Trace JSON for Perfetto UI:
Open /tmp/dynamo-agent-trace.perfetto.json in
Perfetto UI. Each LLM request becomes a timeline
slice grouped by workflow and program lane. The slice args include request IDs,
model, token counts, cache metrics, TTFT, average ITL, queue depth, and worker
IDs. By default, the converter stacks prefill wait, prefill, and decode slices
under each request when those timings are present. Add --include-markers to
emit first-token instant markers, --no-stages for a compact request-only
view, or --separate-stage-tracks to place stages on adjacent tracks when
debugging Perfetto nesting or label rendering. Stage slice boundaries are
normalized to avoid same-thread overlap caused by independent metric rounding;
raw timing fields remain available in event args.
Operator Notes
- Agent request trace emission is currently wired for
/v1/chat/completions. DYN_AGENT_TRACE_SINKSis the enable switch. SettingDYN_AGENT_TRACE_OUTPUT_PATHalone does not enable tracing.- The
jsonlsink appends to the configured path and does not rotate or enforce a maximum file size. Enable it for bounded debug/profiling runs, not as a long-running production sink. - The
jsonl_gzsink rotates compressed segments and is the preferred local file sink for long profiling or RL runs.
Request-End Record
Dynamo emits request_end after the response stream completes or is dropped.
Nullable fields are omitted when the serving path did not record them.
The request object captures Dynamo-owned request performance fields:
This trace does not include prompt/response content, sampling parameters, finish reason, error status, or OpenTelemetry/OpenInference attributes. Use the audit sink for request/response payload capture and OTEL export for span-based observability.
Current Scope
agent_contextis passive metadata.- Dynamo emits request-end trace records when agent tracing is enabled.
jsonl,jsonl_gz, andstderrare local debug/profiling sinks.- Trace records are best-effort profiling data, not durable audit records.
- Future scheduler/profiler consumers should read the normalized trace bus.