Load Generator Options Reference
This guide provides a comprehensive reference for all load generator CLI options in AIPerf, including a compatibility matrix showing which options work together.
Request Scheduling Options
AIPerf determines how to schedule requests based on which CLI options you specify:
Option Priority
When multiple options are specified, AIPerf uses this priority:
--fixed-scheduleor mooncake_trace dataset → Timestamp-based scheduling--user-centric-rate→ Per-user turn gap scheduling--request-rate→ Rate-based scheduling with arrival patterns--concurrencyonly → Burst mode (as fast as possible within limits)
Compatibility Matrix
Legend
- ✅ Compatible - Option works with this configuration
- ⚠️ Conditional - Works with restrictions (see notes)
- ❌ Incompatible - Option conflicts or is ignored
- 🔧 Required - Option is required for this configuration
Scheduling Options
Stop Conditions (at least one required)
Arrival Pattern Options
Arrival Pattern Values:
constant- Fixed inter-arrival times (1/rate)poisson- Exponential inter-arrivals (default with--request-rate)gamma- Tunable smoothness via--arrival-smoothnessconcurrency_burst- As fast as possible within concurrency limits (auto-set when no rate specified)
Concurrency Options
Concurrency behavior by configuration:
- With
--request-rate: Concurrency acts as a ceiling; requests scheduled by rate are blocked if at limit - With
--concurrencyonly (no rate options): Concurrency is the primary driver; sends as fast as possible within limit - With
--fixed-schedule: Concurrency acts as a ceiling; requests fire at scheduled times but blocked if at limit - With
--user-centric-rate: Concurrency acts as a ceiling; user turns fire based on turn_gap but blocked if at limit
Important: If
--concurrencyis not set, session concurrency limiting is disabled (unlimited). For--user-centric-ratemode, consider setting--concurrencyto at least--num-usersto ensure all users can have in-flight requests.
See also: Prefill Concurrency Tutorial for detailed guidance on memory-safe long-context benchmarking.
Grace Period Options
Fixed Schedule Options
Request Cancellation Options
Dataset Options
Session Configuration
Warmup Options
Warmup options work independently of the main benchmark configuration. The warmup phase always uses rate-based scheduling internally.
Configuration Examples
Using --request-rate (Rate-Based Scheduling)
Sends requests at a target average rate with configurable arrival patterns.
Using --concurrency Only (Burst Mode)
Sends requests as fast as possible within concurrency limits. Triggered when no rate option is specified.
Using --fixed-schedule (Trace Replay)
Replays requests at exact timestamps from dataset metadata. Used for trace replay benchmarking.
Using --user-centric-rate (KV Cache Benchmarking)
Per-user rate limiting for KV cache benchmarking. Each user has a consistent gap between their turns.
Key formula: turn_gap = num_users / user_centric_rate
With --num-users 15 and --user-centric-rate 1.0, each user has 15 seconds between their turns.
For complete KV cache benchmarking, also configure shared system prompts and user context prompts. See the User-Centric Timing Tutorial for full configuration including
--shared-system-prompt-length,--user-context-prompt-length, and other prompt options.
Common Validation Errors
Quick Reference: Which Options to Use
Full Options Reference
Scheduling Options
Concurrency Options
Stop Conditions
Request Cancellation
Warmup Options
Fixed Schedule Options
Session Configuration
Multi-URL Load Balancing
See also: Multi-URL Load Balancing Tutorial for detailed configuration and examples.