This guide explains the HTTP request lifecycle tracing metrics available in AIPerf, which provide granular timing information at the transport layer for performance analysis and debugging.
AIPerf captures detailed timing information throughout the HTTP request lifecycle using the aiohttp tracing system. These metrics follow industry-standard conventions from k6 load testing and the HAR (HTTP Archive) specification, making them familiar and compatible with existing performance analysis tools.
Key characteristics:
time.perf_counter_ns() for nanosecond precision during measurementtime.time_ns()) for correlation with logs and external systemshttp_req_ prefix to match k6’s metric namingEnabling trace timing output:
To display HTTP trace timing metrics in the console output, use the --show-trace-timing flag:
This displays a separate table with the HTTP trace timing breakdown after the main metrics table.
The HTTP request lifecycle breaks down into distinct phases, each measured independently:
These metrics capture the time spent establishing a connection before the HTTP request can be sent. They are specific to the aiohttp HTTP client.
These core timing metrics measure the actual HTTP request and response transfer. They are available for any HTTP client that populates the base trace data model.
The http_req_duration metric is measured directly from timestamps for maximum accuracy:
This measures from when the request started being sent to when the response was fully received/finalized. Conceptually this covers sending + waiting + receiving, but the direct measurement is more accurate than summing components.
Connection overhead combines all pre-request setup time:
The http_req_total metric sums all 6 timing phases for a reconcilable breakdown:
http_req_total and http_req_duration may differ slightly because:
http_req_duration is a single end-to-end measurement (response_receive_end - request_send_start)http_req_total sums 6 individual phase measurements, which may have small unmeasured gaps between phasesUse http_req_total when you need the breakdown to add up exactly. Use http_req_duration when you want the most accurate single measurement of request/response exchange time.
TTFB vs TTFT: http_req_waiting measures Time to First Byte (specifically, the first body byte after headers), not Time to First Token. The server sends HTTP headers first, then body content. For LLM APIs, the first body byte may contain protocol overhead before actual tokens appear. Use the time_to_first_token metric for LLM-specific timing that measures when the first actual token content is received.
Connection reuse: When http_req_connection_reused = 1, both http_req_dns_lookup and http_req_connecting will be 0 since no new connection was established.
You may notice that http_req_total can be larger than request_latency. This is expected behavior — the two metrics measure different things:
Why http_req_total > request_latency:
For streaming LLM responses (SSE), the HTTP stream typically ends with:
The request_latency metric excludes trailing metadata ([DONE] markers, usage statistics) because those don’t represent meaningful content delivery. The HTTP trace metrics include all network traffic.
Which metric should I use?
By default, raw HTTP trace data is not included in profile_export.jsonl to keep file sizes small. The computed metrics (http_req_duration, http_req_waiting, etc.) are always available regardless of this setting.
To include the full trace data (timestamps, chunks, headers, socket info), use the --export-http-trace flag:
The --export-http-trace flag works with records or raw export levels:
Example with both flags:
When exported to profile_export.jsonl, trace data uses wall-clock timestamps (nanoseconds since epoch) for cross-system correlation. The trace data is included in each record:
Computed duration fields (blocked_ns, dns_lookup_ns, connecting_ns) are omitted from trace_data when the underlying event did not occur. The corresponding metrics (e.g., http_req_blocked) will report 0 for aggregation purposes, but the trace field itself is absent.
The trace_data object contains both raw timestamps and computed durations:
Raw Timestamps (wall-clock nanoseconds):
Chunk Aggregates (always available):
Chunk Data (only with --export-http-trace, transport-layer granularity):
Computed Durations (nanoseconds):
Request/Response Metadata:
Connection Info (aiohttp only):
If http_req_blocked is consistently high, your connection pool is exhausted. Consider:
If http_req_dns_lookup is high:
http_req_waiting (TTFB) isolates server-side latency:
sending + High waiting = Server is the bottleneckreceiving = Large response or slow network throughputTrack http_req_connection_reused aggregated values:
1.0 (100% reuse) indicate efficient keep-alive usageWhen --export-http-trace is enabled, the request_chunks and response_chunks arrays provide transport-layer granularity useful for:
Aggregate fields (request_chunks_count, request_bytes_total, response_chunks_count, response_bytes_total) are always available regardless of the export flag.
AIPerf trace metrics align with industry standards for compatibility with existing tools:
Per the HAR 1.2 specification:
blocked, dns, connect use -1 when not applicable (AIPerf uses 0 or null)send, wait, receive are required non-negative valuestime (duration) equals the sum of all applicable timing phasesssl timing is included within connect for backwards compatibilityThe k6 http_req_tls_handshaking metric is not separated in AIPerf. TLS time is combined with TCP connection time in http_req_connecting because aiohttp’s tracing API provides a combined measurement via on_connection_create_start/end events.