AIPerf supports benchmarking using the Bailian usage traces, a public dataset of anonymized production chat traces from Qwen model serving. The dataset contains both single-turn requests and multi-turn conversations.
This guide covers replaying Bailian traces with precise timing to reproduce real-world traffic patterns.
Launch a vLLM server with a chat model:
Verify the server is ready:
Bailian traces are JSONL files where each line represents a single request.
chat_id: Randomized unique chat identifiertimestamp: Request arrival time in seconds (converted to milliseconds internally)input_length: Input token countoutput_length: Output token countparent_chat_id: Parent chat ID linking turns in a session; -1 for root (default: -1)type: Request type (text, search, image, file)turn: Conversation turn number (default: 1)hash_ids: Salted SipHash block IDs for KV cache simulation (16 tokens per block)Example entries:
Entries with the same root chat_id form a session and are replayed in turn order.
Download a trace file from the public Bailian dataset:
The repository includes four traces representing different workload types: qwen_traceA_blksz_16.jsonl, qwen_traceB_blksz_16.jsonl, qwen_coder_blksz_16.jsonl, and qwen_thinking_blksz_16.jsonl. Substitute any of them in the command below.