Benchmark LLMs with your own data using single-turn requests, multi-turn conversations, or random sampling.
AIPerf supports three custom dataset types for benchmarking with your own data:
All three support:
Start a vLLM server for testing:
Verify the server is ready:
Each line represents one independent single-turn request.
Use single_turn when you need deterministic, sequential execution where requests always run in the exact order they appear in the file:
Execution: Sequential by default (request 1, then 2, then 3, etc.) Input: Single JSONL file only
Output:
Same content as prompts.jsonl, embedded in the AIPerf YAML config:
See Inline Datasets for the full feature reference.
Control the maximum output tokens per request using the output_length field:
Precedence: Per-line output_length takes priority over the global --osl flag. Lines without output_length fall back to --osl if set (200 in this example), or let the server decide the output length.
The output_length field also works per-turn in multi_turn datasets.
extraSend vendor-specific or sampling parameters per request via the extra field. The dict is shallow-merged into the top of the request body at dispatch. Per-line keys win over --extra-inputs:
The extra field also works per-turn in multi_turn datasets.
Each entry represents a complete conversation with multiple turns.
Use multi_turn when you need conversations with context where each turn builds on previous turns in the conversation:
Execution: Sequential within each conversation (turn 1, then 2, then 3, etc.), but multiple conversations run concurrently Input: Single JSONL file only
Output:
Key Points:
--concurrency)output_length and extra (same semantics as single_turn — vendor extras shallow-merged into the top of the wire body, latest turn wins for chat-style endpoints)Randomly sample from one or more data pools for varied request patterns.
Use random_pool when you need random sampling with replacement for unpredictable, varied request patterns:
Execution: Random sampling with replacement (same entry can be selected multiple times) Input: Single JSONL file OR directory of multiple JSONL files Note: Does NOT support timing control or multi-turn conversations
Output:
Behavior:
--random-seed for reproducibility