AIPerf supports benchmarking using BurstGPT, a real-world LLM traffic trace dataset from Microsoft Research. The dataset captures bursty request patterns with per-request token counts.
This guide covers replaying BurstGPT traces to reproduce real-world traffic patterns against your inference server.
Launch a vLLM server with a chat model:
Verify the server is ready:
BurstGPT traces are CSV files where each row represents a single independent request.
Example rows:
Each row is treated as an independent single-turn request. AIPerf synthesizes prompts of the prescribed token lengths — no actual prompt text is stored in the trace.
Download a trace file from the BurstGPT repository and run a benchmark:
Sample Output (Successful Run):