HTTP Trace Metrics Guide

View as Markdown

This guide explains the HTTP request lifecycle tracing metrics available in AIPerf, which provide granular timing information at the transport layer for performance analysis and debugging.

Overview

AIPerf captures detailed timing information throughout the HTTP request lifecycle using the aiohttp tracing system. These metrics follow industry-standard conventions from k6 load testing and the HAR (HTTP Archive) specification, making them familiar and compatible with existing performance analysis tools.

Key characteristics:

  • Trace metrics are captured using time.perf_counter_ns() for nanosecond precision during measurement
  • When exported, timestamps are converted to wall-clock time (time.time_ns()) for correlation with logs and external systems
  • The naming convention uses the http_req_ prefix to match k6’s metric naming

Enabling trace timing output:

To display HTTP trace timing metrics in the console output, use the --show-trace-timing flag:

$aiperf profile ... --show-trace-timing

This displays a separate table with the HTTP trace timing breakdown after the main metrics table.

Request Lifecycle

The HTTP request lifecycle breaks down into distinct phases, each measured independently:

Request Lifecycle ────────────────────────────────────────────────────────────────────────────────────►
│ │ │ │ │ │
│◄─dns_lookup_ns─►│◄─connecting_ns─►│◄─ sending_ns ─►│◄──── waiting_ns ──►│◄──── receiving_ns ───►│
│ │ │ │ | │ │
dns resolution TCP+TLS request send request_send_end first body chunk last body chunk
(cache miss) handshake (last chunk) (ready for server) (response starts) (response complete)
│ │ │
└─ dns_cache_hit (skip lookup) └── response_headers_received
└─ connection_reused_ns (skip TCP/TLS)

Metric Reference

Connection Phase Metrics

These metrics capture the time spent establishing a connection before the HTTP request can be sent. They are specific to the aiohttp HTTP client.

Metrick6 EquivalentHAR EquivalentDescription
http_req_blockedhttp_req_blockedblockedTime spent waiting for a free connection slot from the connection pool. High values indicate pool saturation.
http_req_dns_lookuphttp_req_looking_updnsTime spent on DNS resolution. Returns 0 if DNS was cached or connection was reused.
http_req_connectinghttp_req_connectingconnectTime to establish TCP connection. For HTTPS, this includes TLS handshake time. Returns 0 if connection was reused.
http_req_connection_reusedBinary indicator (0 or 1) showing whether an existing connection was reused from the pool.
http_req_connection_overheadCombined overhead: blocked + dns_lookup + connecting. Total pre-request setup cost.

Request/Response Phase Metrics

These core timing metrics measure the actual HTTP request and response transfer. They are available for any HTTP client that populates the base trace data model.

Metrick6 EquivalentHAR EquivalentDescription
http_req_sendinghttp_req_sendingsendTime to transmit the complete HTTP request (headers + body) to the server.
http_req_waitinghttp_req_waitingwaitTime to First Byte (TTFB) — time from request completion to first response body byte. Represents server processing time + network latency. Note: This measures time to the first body chunk, not the first header byte.
http_req_receivinghttp_req_receivingreceiveTime to download the complete response body. Returns 0 for single-chunk responses.
http_req_durationhttp_req_durationtimeRequest/response exchange time (excluding connection overhead): sending + waiting + receiving
http_req_totalFull end-to-end time: sum of all 6 timing phases. See HTTP Total Time vs Request Latency.

Data Size Metrics

MetricDescription
http_req_data_sentTotal bytes transmitted in the request (transport layer).
http_req_data_receivedTotal bytes received in the response (transport layer).
http_req_chunks_sentNumber of transport-level write operations during the request.
http_req_chunks_receivedNumber of transport-level read operations during the response.

Key Relationships

Duration Metric

The http_req_duration metric is measured directly from timestamps for maximum accuracy:

http_req_duration = response_receive_end_perf_ns - request_send_start_perf_ns

This measures from when the request started being sent to when the response was fully received/finalized. Conceptually this covers sending + waiting + receiving, but the direct measurement is more accurate than summing components.

Total Connection Overhead

Connection overhead combines all pre-request setup time:

http_req_connection_overhead = http_req_blocked + http_req_dns_lookup + http_req_connecting

Total Time Formula

The http_req_total metric sums all 6 timing phases for a reconcilable breakdown:

http_req_total = http_req_blocked + http_req_dns_lookup + http_req_connecting
+ http_req_sending + http_req_waiting + http_req_receiving

http_req_total and http_req_duration may differ slightly because:

  • http_req_duration is a single end-to-end measurement (response_receive_end - request_send_start)
  • http_req_total sums 6 individual phase measurements, which may have small unmeasured gaps between phases

Use http_req_total when you need the breakdown to add up exactly. Use http_req_duration when you want the most accurate single measurement of request/response exchange time.

Important Distinctions

  • TTFB vs TTFT: http_req_waiting measures Time to First Byte (specifically, the first body byte after headers), not Time to First Token. The server sends HTTP headers first, then body content. For LLM APIs, the first body byte may contain protocol overhead before actual tokens appear. Use the time_to_first_token metric for LLM-specific timing that measures when the first actual token content is received.

  • Connection reuse: When http_req_connection_reused = 1, both http_req_dns_lookup and http_req_connecting will be 0 since no new connection was established.

HTTP Total Time vs Request Latency

You may notice that http_req_total can be larger than request_latency. This is expected behavior — the two metrics measure different things:

MetricStartEndWhat it measures
request_latencyBefore HTTP callLast content responseTime to receive all meaningful tokens
http_req_totalSum of phases starting at pool waitLast network chunkSum of all HTTP timing phases
http_req_durationRequest send startLast network chunkDirect measured request/response exchange

Why http_req_total > request_latency:

For streaming LLM responses (SSE), the HTTP stream typically ends with:

[content chunk 1] ─► included in both metrics
[content chunk 2] ─► included in both metrics
[content chunk N] ─► request_latency ends here (last actual token)
[usage info] ─► http_req_total includes this
[DONE] ─► http_req_total and http_req_duration end here (last network chunk)

The request_latency metric excludes trailing metadata ([DONE] markers, usage statistics) because those don’t represent meaningful content delivery. The HTTP trace metrics include all network traffic.

Which metric should I use?

Use CaseRecommended Metric
User-perceived latency (“when did I get the last useful token?”)request_latency
Transport-level timing with reconcilable breakdownhttp_req_total
Most accurate single request/response measurementhttp_req_duration
Debugging: gap between content and stream endhttp_req_total - request_latency

Accessing Trace Data

Enabling HTTP Trace Export

By default, raw HTTP trace data is not included in profile_export.jsonl to keep file sizes small. The computed metrics (http_req_duration, http_req_waiting, etc.) are always available regardless of this setting.

To include the full trace data (timestamps, chunks, headers, socket info), use the --export-http-trace flag:

$aiperf profile ... --export-http-trace

Export Levels

The --export-http-trace flag works with records or raw export levels:

Export LevelTrace Data (with --export-http-trace)Use Case
summaryNot availableQuick benchmark summaries
records✓ IncludedPer-request analysis with timing details
raw✓ IncludedFull debugging with complete request/response data

Example with both flags:

$aiperf profile ... --export-level records --export-http-trace

Output Format

When exported to profile_export.jsonl, trace data uses wall-clock timestamps (nanoseconds since epoch) for cross-system correlation. The trace data is included in each record:

1{
2 "metadata": { "x_request_id": "9568f1d7-10e9-4d42-bb69-b06c87caae9f", "..." : "..." },
3 "metrics": {
4 "http_req_duration": {"value": 37421.26, "unit": "ms"},
5 "http_req_waiting": {"value": 4473.26, "unit": "ms"},
6 "http_req_blocked": {"value": 0.0, "unit": "ms"},
7 "..." : "..."
8 },
9 "trace_data": {
10 "trace_type": "aiohttp",
11 "request_send_start_ns": 1768309400341882300,
12 "request_headers": {"Host": "localhost:8000", "Content-Type": "application/json", "...": "..."},
13 "request_headers_sent_ns": 1768309400342515706,
14 "request_chunks": [[1768309400342549481, 91586]],
15 "request_send_end_ns": 1768309400342549481,
16 "request_chunks_count": 1,
17 "request_bytes_total": 91586,
18 "response_status_code": 200,
19 "response_reason": "OK",
20 "response_receive_start_ns": 1768309404815807889,
21 "response_headers": {"Content-Type": "text/event-stream; charset=utf-8", "...": "..."},
22 "response_headers_received_ns": 1768309400369701553,
23 "response_chunks": [[1768309404815807889, 565], [1768309405191294711, 268], "..."],
24 "response_chunks_count": 42,
25 "response_bytes_total": 16384,
26 "response_receive_end_ns": 1768309437763141384,
27 "sending_ns": 667181,
28 "waiting_ns": 4473258408,
29 "receiving_ns": 32947333495,
30 "duration_ns": 37421259084,
31 "tcp_connect_start_ns": 1768309400341999707,
32 "tcp_connect_end_ns": 1768309400342480993,
33 "connecting_ns": 481286,
34 "dns_cache_hit_ns": 1768309400342023726,
35 "local_ip": "127.0.0.1",
36 "local_port": 48362,
37 "remote_ip": "127.0.0.1",
38 "remote_port": 8000
39 },
40 "error": null
41}

Computed duration fields (blocked_ns, dns_lookup_ns, connecting_ns) are omitted from trace_data when the underlying event did not occur. The corresponding metrics (e.g., http_req_blocked) will report 0 for aggregation purposes, but the trace field itself is absent.

Trace Data Fields

The trace_data object contains both raw timestamps and computed durations:

Raw Timestamps (wall-clock nanoseconds):

FieldDescription
request_send_start_nsWhen the HTTP request started being sent
request_headers_sent_nsWhen the request headers finished being sent
request_send_end_nsWhen the request body finished being sent (computed from last chunk)
response_headers_received_nsWhen response headers were received
response_receive_start_nsWhen the first response body chunk was received
response_receive_end_nsWhen the response finished being received
error_timestamp_nsWhen an error occurred during the request (if any)
connection_pool_wait_start_nsWhen waiting for a connection started (aiohttp only)
connection_pool_wait_end_nsWhen a connection was obtained (aiohttp only)
tcp_connect_start_nsWhen TCP connection establishment started (aiohttp only)
tcp_connect_end_nsWhen TCP connection completed (aiohttp only)
connection_reused_nsWhen an existing connection was reused (aiohttp only)
dns_lookup_start_nsWhen DNS resolution started (aiohttp only)
dns_lookup_end_nsWhen DNS resolution completed (aiohttp only)
dns_cache_hit_nsWhen a DNS cache hit occurred (aiohttp only)
dns_cache_miss_nsWhen a DNS cache miss occurred (aiohttp only)

Chunk Aggregates (always available):

FieldDescription
request_chunks_countNumber of request chunks sent
request_bytes_totalTotal bytes sent in request chunks
response_chunks_countNumber of response chunks received
response_bytes_totalTotal bytes received in response chunks

Chunk Data (only with --export-http-trace, transport-layer granularity):

FieldDescription
request_chunksArray of [timestamp_ns, size_bytes] tuples for each request chunk sent
response_chunksArray of [timestamp_ns, size_bytes] tuples for each response chunk received

Computed Durations (nanoseconds):

FieldDescription
sending_nsRequest send time
waiting_nsTime to first byte (TTFB)
receiving_nsResponse transfer time
duration_nsTotal request duration
blocked_nsConnection pool wait time
dns_lookup_nsDNS resolution time
connecting_nsTCP/TLS connection time

Request/Response Metadata:

FieldDescription
request_headersDictionary of request headers sent
response_status_codeHTTP status code of the response
response_reasonHTTP status reason phrase (e.g., “OK”, “Not Found”)
response_headersDictionary of response headers received

Connection Info (aiohttp only):

FieldDescription
local_ipLocal IP address used for the connection
local_portLocal (ephemeral) port used
remote_ipRemote server IP address
remote_portRemote server port

Common Use Cases

Identifying Connection Pool Saturation

If http_req_blocked is consistently high, your connection pool is exhausted. Consider:

  • Increasing the connection pool size
  • Reducing the number of concurrent requests
  • Investigating slow responses that hold connections

Detecting DNS Issues

If http_req_dns_lookup is high:

  • DNS resolution is slow
  • Consider using DNS caching or a faster resolver
  • Check if DNS TTLs are appropriate

Measuring Server Processing Time

http_req_waiting (TTFB) isolates server-side latency:

  • Low sending + High waiting = Server is the bottleneck
  • High receiving = Large response or slow network throughput

Analyzing Connection Efficiency

Track http_req_connection_reused aggregated values:

  • Values close to 1.0 (100% reuse) indicate efficient keep-alive usage
  • Low reuse rates suggest connection churn, adding overhead via DNS lookups and TCP/TLS handshakes

Chunk-Level Analysis

When --export-http-trace is enabled, the request_chunks and response_chunks arrays provide transport-layer granularity useful for:

  • Analyzing streaming response patterns from LLM APIs
  • Debugging chunked transfer encoding issues
  • Understanding network-level timing variations

Aggregate fields (request_chunks_count, request_bytes_total, response_chunks_count, response_bytes_total) are always available regardless of the export flag.

Standards Compliance

AIPerf trace metrics align with industry standards for compatibility with existing tools:

HAR Specification Notes

Per the HAR 1.2 specification:

  • blocked, dns, connect use -1 when not applicable (AIPerf uses 0 or null)
  • send, wait, receive are required non-negative values
  • time (duration) equals the sum of all applicable timing phases
  • ssl timing is included within connect for backwards compatibility

k6 Metric Differences

The k6 http_req_tls_handshaking metric is not separated in AIPerf. TLS time is combined with TCP connection time in http_req_connecting because aiohttp’s tracing API provides a combined measurement via on_connection_create_start/end events.

Quick Reference

What You Want to KnowMetric to Use
Full end-to-end HTTP timehttp_req_total
Request/response time (excl. connection)http_req_duration
Server processing timehttp_req_waiting (TTFB)
Network transfer efficiencyhttp_req_receiving / http_req_data_received
Connection pool healthhttp_req_blocked
Connection reuse ratehttp_req_connection_reused
DNS performancehttp_req_dns_lookup
Pre-request overheadhttp_req_connection_overhead
User-perceived latency (LLM)request_latency (not an HTTP trace metric)