Arrival Patterns: Simulating Realistic Traffic
Arrival Patterns: Simulating Realistic Traffic
Arrival Patterns: Simulating Realistic Traffic
When benchmarking with --request-rate, AIPerf can vary how requests arrive over time. The --arrival-pattern option controls the distribution of inter-arrival times, letting you simulate everything from perfectly regular traffic to bursty real-world patterns.
Real traffic doesn’t arrive at perfectly regular intervals. Traffic comes in bursts—quiet periods followed by sudden spikes. How your server handles this variance affects real-world performance.
Requests arrive at perfectly regular intervals: exactly 1/rate seconds apart.
Use cases:
Requests arrive according to a Poisson process—the mathematical model for random events at a constant average rate. Inter-arrival times follow an exponential distribution.
Characteristics:
1/rate (same as constant)(1/rate)² (natural randomness)Use cases:
Gamma distribution generalizes Poisson with a smoothness parameter that controls how bursty or regular arrivals are:
Mathematical note: The smoothness parameter is the Gamma distribution’s shape parameter (k). Scale is automatically computed to maintain the correct mean rate.
When you omit --request-rate and only specify --concurrency, AIPerf uses burst mode: zero delay between request dispatches, limited only by the concurrency semaphore.
Use cases:
AIPerf’s --arrival-smoothness is compatible with vLLM’s --burstiness parameter:
This allows direct comparison between AIPerf and vLLM benchmark results when using the same smoothness/burstiness value.
Compare how your server handles ideal vs realistic traffic:
INFO Starting AIPerf System INFO Using Request_Rate strategy with constant arrival pattern INFO AIPerf System is PROFILING
Profiling: [01:00] - Running for 60 seconds…
INFO Benchmark completed successfully INFO Results saved to: results/constant/
NVIDIA AIPerf | LLM Metrics ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓ ┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p50 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩ │ Request Latency (ms) │ 178.45 │ 156.23 │ 212.34 │ 205.67 │ 176.89 │ │ Time to First Token (ms) │ 45.67 │ 38.12 │ 58.34 │ 56.23 │ 44.90 │ │ Inter Token Latency (ms) │ 11.23 │ 9.45 │ 14.67 │ 14.12 │ 11.01 │ │ Request Throughput (req/s) │ 98.45 │ - │ - │ - │ - │ └────────────────────────────┴────────┴────────┴────────┴────────┴────────┘
JSON Export: results/constant/profile_export_aiperf.json
Expected Output (Run 2):
Compare TTFT and throughput between runs. Higher variance under Poisson indicates sensitivity to traffic patterns.
Test how your server handles request bursts:
Sample Output (Successful Run):
Smoothness of 0.3 creates highly bursty traffic—several requests arrive nearly simultaneously, then quiet periods.
Reduce variance in measurements for controlled experiments:
Sample Output (Successful Run):
Smoothness of 5.0 produces very regular arrivals, reducing measurement noise while still having some natural variance.
Run multiple benchmarks with increasing burstiness to find where performance degrades:
Use constant arrivals during warmup, then realistic patterns for profiling:
Constraints:
--arrival-pattern requires --request-rate to be set--arrival-smoothness only applies when --arrival-pattern gamma--user-centric-rate (deterministic per-user scheduling)--fixed-schedule (timestamp-based scheduling)For those who want to understand the statistical properties:
Where λ = request rate and k = smoothness.