Arrival Patterns: Simulating Realistic Traffic
Arrival Patterns: Simulating Realistic Traffic
When benchmarking with --request-rate, AIPerf can vary how requests arrive over time. The --arrival-pattern option controls the distribution of inter-arrival times, letting you simulate everything from perfectly regular traffic to bursty real-world patterns.
Why Arrival Patterns Matter
Real traffic doesn’t arrive at perfectly regular intervals. Traffic comes in bursts—quiet periods followed by sudden spikes. How your server handles this variance affects real-world performance.
Quick Start
Available Patterns
Constant
Requests arrive at perfectly regular intervals: exactly 1/rate seconds apart.
Use cases:
- Baseline measurements with no variance
- Debugging timing issues
- Comparing against variable patterns
- Deterministic, reproducible tests
Poisson (Default)
Requests arrive according to a Poisson process—the mathematical model for random events at a constant average rate. Inter-arrival times follow an exponential distribution.
Characteristics:
- Mean inter-arrival =
1/rate(same as constant) - Variance =
(1/rate)²(natural randomness) - Sometimes requests cluster, sometimes gaps appear
- Models real user behavior where arrivals are independent
Use cases:
- Default realistic traffic simulation
- Standard load testing
- Comparing to theoretical queueing models
Gamma (Tunable Burstiness)
Gamma distribution generalizes Poisson with a smoothness parameter that controls how bursty or regular arrivals are:
Mathematical note: The smoothness parameter is the Gamma distribution’s shape parameter (k). Scale is automatically computed to maintain the correct mean rate.
Concurrency Burst
When you omit --request-rate and only specify --concurrency, AIPerf uses burst mode: zero delay between request dispatches, limited only by the concurrency semaphore.
Use cases:
- Maximum throughput discovery
- Saturation testing
- Finding server capacity limits
vLLM Compatibility
AIPerf’s --arrival-smoothness is compatible with vLLM’s --burstiness parameter:
This allows direct comparison between AIPerf and vLLM benchmark results when using the same smoothness/burstiness value.
Examples
Baseline vs Realistic Comparison
Compare how your server handles ideal vs realistic traffic:
INFO Starting AIPerf System INFO Using Request_Rate strategy with constant arrival pattern INFO AIPerf System is PROFILING
Profiling: [01:00] - Running for 60 seconds…
INFO Benchmark completed successfully INFO Results saved to: results/constant/
NVIDIA AIPerf | LLM Metrics ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓ ┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p50 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩ │ Request Latency (ms) │ 178.45 │ 156.23 │ 212.34 │ 205.67 │ 176.89 │ │ Time to First Token (ms) │ 45.67 │ 38.12 │ 58.34 │ 56.23 │ 44.90 │ │ Inter Token Latency (ms) │ 11.23 │ 9.45 │ 14.67 │ 14.12 │ 11.01 │ │ Request Throughput (req/s) │ 98.45 │ - │ - │ - │ - │ └────────────────────────────┴────────┴────────┴────────┴────────┴────────┘
JSON Export: results/constant/profile_export_aiperf.json
Expected Output (Run 2):
Compare TTFT and throughput between runs. Higher variance under Poisson indicates sensitivity to traffic patterns.
Stress Testing with Bursty Traffic
Test how your server handles request bursts:
Sample Output (Successful Run):
Smoothness of 0.3 creates highly bursty traffic—several requests arrive nearly simultaneously, then quiet periods.
Smooth Traffic for Noise Reduction
Reduce variance in measurements for controlled experiments:
Sample Output (Successful Run):
Smoothness of 5.0 produces very regular arrivals, reducing measurement noise while still having some natural variance.
Progressive Burstiness Test
Run multiple benchmarks with increasing burstiness to find where performance degrades:
Warmup with Stable Pattern, Profile with Realistic
Use constant arrivals during warmup, then realistic patterns for profiling:
CLI Reference
Constraints:
--arrival-patternrequires--request-rateto be set--arrival-smoothnessonly applies when--arrival-pattern gamma- Cannot use with
--user-centric-rate(deterministic per-user scheduling) - Cannot use with
--fixed-schedule(timestamp-based scheduling)
Pattern Selection Guide
Understanding the Math
For those who want to understand the statistical properties:
Where λ = request rate and k = smoothness.
- CV (Coefficient of Variation) = standard deviation / mean
- Lower CV = more regular arrivals
- Gamma with k=1 equals Poisson (CV=1)
- As k→∞, Gamma approaches Constant (CV→0)
Related Documentation
- Request Rate with Concurrency — Combining rate and concurrency
- Warmup Phase — Configuring warmup with different patterns
- Timing Modes Reference — Complete CLI compatibility matrix