Profile with Bailian Traces
AIPerf supports benchmarking using the Bailian usage traces, a public dataset of anonymized production chat traces from Qwen model serving. The dataset contains both single-turn requests and multi-turn conversations.
This guide covers replaying Bailian traces with precise timing to reproduce real-world traffic patterns.
Start a vLLM Server
Launch a vLLM server with a chat model:
Verify the server is ready:
Bailian Trace Format
Bailian traces are JSONL files where each line represents a single request.
chat_id: Randomized unique chat identifiertimestamp: Request arrival time in seconds (converted to milliseconds internally)input_length: Input token countoutput_length: Output token countparent_chat_id: Parent chat ID linking turns in a session;-1for root (default:-1)type: Request type (text,search,image,file)turn: Conversation turn number (default:1)hash_ids: Salted SipHash block IDs for KV cache simulation (16 tokens per block)
Example entries:
Entries with the same root chat_id form a session and are replayed in turn order.
Download and Profile
Download a trace file from the public Bailian dataset:
The repository includes four traces representing different workload types: qwen_traceA_blksz_16.jsonl, qwen_traceB_blksz_16.jsonl, qwen_coder_blksz_16.jsonl, and qwen_thinking_blksz_16.jsonl. Substitute any of them in the command below.
Related Tutorials
- Trace Benchmarking with Mooncake - Mooncake FAST’25 trace replay
- Fixed Schedule - Precise timestamp-based execution for any dataset
- Prefix Synthesis - KV cache testing with hash-based prefix data
- Multi-Turn Conversations - Multi-turn conversation benchmarking
- Conversation Context Mode - How conversation history accumulates in multi-turn