Agent Context and Tracing | NVIDIA Dynamo Documentation

Dynamo supports passive agent request tracing. An agent harness can attach identity metadata to each LLM request, and Dynamo can write normalized request_end records to configured trace sinks.

This is observability only. It does not change routing, scheduling, or cache behavior.

Request Metadata

Set nvext.agent_context on chat completion requests:

1 {
2   "model": "my-model",
3   "messages": [{"role": "user", "content": "Research Dynamo agent tracing."}],
4   "nvext": {
5     "agent_context": {
6       "workflow_type_id": "deep_research",
7       "workflow_id": "research-run-42",
8       "program_id": "research-run-42:researcher",
9       "parent_program_id": "research-run-42:planner"
10     }
11   }
12 }

For per-call correlation, set the HTTP x-request-id header to the harness LLM call ID:

x-request-id: llm-call-42

x-request-id is not Dynamo’s internal inference request ID. It is copied into the trace record as request.x_request_id.

Field	Required	Meaning
`workflow_type_id`	Yes	Reusable workload/profile class, such as `deep_research` or `coding_agent`.
`workflow_id`	Yes	Top-level run identifier.
`program_id`	Yes	One schedulable reasoning/tool trajectory.
`parent_program_id`	No	Parent program for subagents.

Enabling Trace Output

Set DYN_AGENT_TRACE_SINKS before starting Dynamo. Use jsonl for local trace files, jsonl_gz for rotating compressed trace segments, stderr for development logging, or a comma-separated list:

$ export DYN_AGENT_TRACE_SINKS=jsonl_gz,stderr
$ export DYN_AGENT_TRACE_OUTPUT_PATH=/tmp/dynamo-agent-trace
$ export DYN_AGENT_TRACE_CAPACITY=1024

Minimum setup for rotating compressed traces:

$ export DYN_AGENT_TRACE_SINKS=jsonl_gz
$ export DYN_AGENT_TRACE_OUTPUT_PATH=/tmp/dynamo-agent-trace

Environment Variable	Required	Default	Description
`DYN_AGENT_TRACE_SINKS`	Yes	unset	Enables agent tracing and selects sinks. Supported values: `jsonl`, `jsonl_gz`, `stderr`, or a comma-separated list such as `jsonl_gz,stderr`.
`DYN_AGENT_TRACE_OUTPUT_PATH`	If `jsonl` or `jsonl_gz` is selected	unset	Local trace output path. For `jsonl`, this is the literal `.jsonl` file path. For `jsonl_gz`, this is the segment prefix used to derive `.jsonl.gz` files.
`DYN_AGENT_TRACE_CAPACITY`	No	`1024`	In-process trace bus capacity.
`DYN_AGENT_TRACE_JSONL_BUFFER_BYTES`	No	`1048576`	JSONL writer buffer size. For `jsonl_gz`, this is the max uncompressed batch size before appending a complete gzip member.
`DYN_AGENT_TRACE_JSONL_FLUSH_INTERVAL_MS`	No	`1000`	JSONL periodic flush interval. For `jsonl_gz`, each flush appends a complete gzip member.
`DYN_AGENT_TRACE_JSONL_GZ_ROLL_BYTES`	No	`268435456`	`jsonl_gz` segment roll threshold in uncompressed bytes.
`DYN_AGENT_TRACE_JSONL_GZ_ROLL_LINES`	No	unset	Optional `jsonl_gz` segment roll threshold in records.

The jsonl sink writes one recorder JSON object per line: {"timestamp": <elapsed_ms>, "event": <normalized trace event>}. The jsonl_gz sink writes the same JSONL records into numbered compressed segments derived from DYN_AGENT_TRACE_OUTPUT_PATH, such as /tmp/dynamo-agent-trace.000000.jsonl.gz and /tmp/dynamo-agent-trace.000001.jsonl.gz. Each flush appends a complete gzip member, so standard gzip tools can read the concatenated stream. The stderr sink logs the normalized trace event as a structured agent_trace log record. All sinks are best-effort telemetry for debugging and offline profiling. They are not durable audit logs.

ms-agent End-to-End Smoke

To see this in action, use a fork of the ModelScope ms-agent DeepResearch agent framework with Dynamo trace hooks. Until those hooks land upstream, this branch injects nvext.agent_context and x-request-id on LLM requests:

$ uv pip install -e "git+ssh://git@github.com/ishandhanani/ms-agent.git@idhanani/dynamo-agent-trace#egg=ms-agent"

Start Dynamo with a local compressed trace sink:

$ export DYN_AGENT_TRACE_SINKS=jsonl_gz
$ export DYN_AGENT_TRACE_OUTPUT_PATH=/tmp/dynamo-agent-trace
$ 
$ # Launch any Dynamo OpenAI-compatible backend on :8000.

Run ms-agent against Dynamo. Set a stable workflow ID if you want to grep or query one smoke run:

$ export DYNAMO_AGENT_WORKFLOW_ID=ms-agent-smoke-$(date +%Y%m%d-%H%M%S)
$ 
$ ms-agent run \
>   --config /path/to/agent.yaml \
>   --query "What is 2 + 2? Answer with just the number." \
>   --trust_remote_code true

Read the resulting compressed trace records:

$ gzip -cd /tmp/dynamo-agent-trace.*.jsonl.gz | jq .

Expected records should contain event.event_type = "request_end", event.agent_context.workflow_id matching DYNAMO_AGENT_WORKFLOW_ID, the caller x_request_id, token counts, TTFT, average ITL, cache metrics, queue depth, and worker IDs when available.

Perfetto Timeline Conversion

Convert Dynamo agent trace shards to Chrome Trace JSON for Perfetto UI:

$ python3 benchmarks/agent_trace/convert_to_perfetto.py \
>   "/tmp/dynamo-agent-trace.*.jsonl.gz" \
>   --output /tmp/dynamo-agent-trace.perfetto.json

Open /tmp/dynamo-agent-trace.perfetto.json in Perfetto UI. Each LLM request becomes a timeline slice grouped by workflow and program lane. The slice args include request IDs, model, token counts, cache metrics, TTFT, average ITL, queue depth, and worker IDs. By default, the converter stacks prefill wait, prefill, and decode slices under each request when those timings are present. Add --include-markers to emit first-token instant markers, --no-stages for a compact request-only view, or --separate-stage-tracks to place stages on adjacent tracks when debugging Perfetto nesting or label rendering. Stage slice boundaries are normalized to avoid same-thread overlap caused by independent metric rounding; raw timing fields remain available in event args.

Operator Notes

Agent request trace emission is currently wired for /v1/chat/completions.
DYN_AGENT_TRACE_SINKS is the enable switch. Setting DYN_AGENT_TRACE_OUTPUT_PATH alone does not enable tracing.
The jsonl sink appends to the configured path and does not rotate or enforce a maximum file size. Enable it for bounded debug/profiling runs, not as a long-running production sink.
The jsonl_gz sink rotates compressed segments and is the preferred local file sink for long profiling or RL runs.

Request-End Record

Dynamo emits request_end after the response stream completes or is dropped. Nullable fields are omitted when the serving path did not record them.

1 {
2   "schema": "dynamo.agent.trace.v1",
3   "event_type": "request_end",
4   "event_time_unix_ms": 1777312801000,
5   "event_source": "dynamo",
6   "agent_context": {
7     "workflow_type_id": "deep_research",
8     "workflow_id": "research-run-42",
9     "program_id": "research-run-42:researcher",
10     "parent_program_id": "research-run-42:planner"
11   },
12   "request": {
13     "request_id": "dynamo-request-id",
14     "x_request_id": "llm-call-42",
15     "model": "my-model",
16     "input_tokens": 4096,
17     "output_tokens": 512,
18     "cached_tokens": 3584,
19     "request_received_ms": 1777312800000,
20     "prefill_wait_time_ms": 12.1,
21     "prefill_time_ms": 70.3,
22     "ttft_ms": 82.4,
23     "total_time_ms": 1000.1,
24     "avg_itl_ms": 1.8,
25     "kv_hit_rate": 0.875,
26     "kv_transfer_estimated_latency_ms": 4.2,
27     "queue_depth": 3,
28     "worker": {
29       "prefill_worker_id": 0,
30       "prefill_dp_rank": 0,
31       "decode_worker_id": 1,
32       "decode_dp_rank": 0
33     }
34   }
35 }

The request object captures Dynamo-owned request performance fields:

Field	Meaning
`request_id`	Dynamo request ID for the LLM call.
`x_request_id`	Caller-provided logical request ID when present.
`model`	Requested model name.
`input_tokens`	Prompt/input token count when known.
`output_tokens`	Final output token count when known.
`cached_tokens`	Prompt tokens served from prefix/KV cache when known.
`request_received_ms`	Request receive time in Unix epoch milliseconds.
`prefill_wait_time_ms`	Time from request receipt to prefill start.
`prefill_time_ms`	Time from prefill start to first token.
`ttft_ms`	Time from request receipt to first token.
`total_time_ms`	Time from request receipt to request completion.
`avg_itl_ms`	Average inter-token latency after first token.
`kv_hit_rate`	Effective KV-cache hit rate observed by the router.
`kv_transfer_estimated_latency_ms`	Upper-bound estimated disaggregated KV transfer latency.
`queue_depth`	Router queue depth observed when routing the request.
`worker`	Prefill/decode worker IDs and DP ranks when recorded.

This trace does not include prompt/response content, sampling parameters, finish reason, error status, or OpenTelemetry/OpenInference attributes. Use the audit sink for request/response payload capture and OTEL export for span-based observability.

Current Scope

agent_context is passive metadata.
Dynamo emits request-end trace records when agent tracing is enabled.
jsonl, jsonl_gz, and stderr are local debug/profiling sinks.
Trace records are best-effort profiling data, not durable audit records.
Future scheduler/profiler consumers should read the normalized trace bus.