Multi-Turn Conversations
Multi-Turn Conversations
Multi-Turn Conversations
Multi-turn conversations allow you to benchmark chat-based models with realistic back-and-forth dialogue patterns. This feature simulates real-world scenarios where users engage in extended conversations with multiple exchanges, rather than isolated single-turn queries.
Multi-turn benchmarking provides several advantages:
Understanding Request Control Options
AIPerf provides different options for controlling the number of requests depending on whether you’re running single-turn or multi-turn benchmarks:
--request-count: Controls the total number of single-turn requests to send. Use this for traditional single-turn benchmarks.--conversation-num: Controls the total number of conversations (sessions) to send in multi-turn scenarios. Each conversation may contain multiple turns (requests).These options are mutually exclusive in their intent - use --request-count for single-turn benchmarking and --conversation-num for multi-turn benchmarking to avoid confusion.
Dataset Generation vs Request Execution
The --num-dataset-entries option controls how many unique prompts are generated in the dataset. This is separate from the number of requests or conversations:
--num-dataset-entries: Number of unique prompt entries to generate in the dataset--request-count: Number of single-turn requests to send (for single-turn benchmarks)--conversation-num: Number of conversations to send (for multi-turn benchmarks)The dataset entries are reused/sampled as needed to fulfill the total request or conversation count. For example, you might generate 100 unique prompts (--num-dataset-entries 100) but send 1000 requests that sample from those prompts. --dataset-sampling-strategy determines how the pool of prompts is sampled when building payloads.
--conversation-num <N>: Total number of unique conversation sessions to execute
--num-conversations, --num-sessions--conversation-turn-mean <N>: Average number of turns per conversation
--session-turns-mean--conversation-turn-stddev <N>: Standard deviation for number of turns
--session-turns-stddev--conversation-turn-delay-mean <MS>: Average delay between turns in milliseconds
--session-turn-delay-mean--conversation-turn-delay-stddev <MS>: Standard deviation for turn delays
--session-turn-delay-stddevRun a simple multi-turn benchmark with a fixed number of turns per conversation:
Sample Output (Successful Run):
This command will:
Add variance to the number of turns per conversation for more realistic patterns:
This creates conversations with varying lengths (typically 3-7 turns), simulating natural conversation patterns where some users ask quick questions and others engage in deeper discussions.
Simulate real user “think time” between turns to model actual human interaction patterns:
The turn delays simulate realistic pauses as users read responses and formulate follow-up questions. This is critical for:
Test how your server handles many simultaneous multi-turn conversations:
This benchmark:
Combine request rate control with multi-turn conversations for controlled, sustained load:
This approach is ideal for:
Simulate realistic customer support interactions with varying conversation lengths:
Test model performance with long conversations that accumulate substantial context:
Each turn in a conversation includes the full conversation history, so:
This helps identify performance degradation as context grows.
Simulate sudden spikes in conversation activity:
In multi-turn conversations, each subsequent turn includes the complete conversation history:
Turn 1:
Turn 2:
Turn 3:
This accumulation means:
AIPerf simulates realistic multi-turn conversations by modeling natural user behavior patterns. Here’s how a typical multi-turn conversation flows:
Turn 0 (First Turn):
Turn 1 (Second Turn):
Turn 2 (Third Turn):
…and so on for subsequent turns
This flow pattern ensures benchmarks reflect real-world usage where:
The delays between turns are controlled by:
--conversation-turn-delay-mean: Average delay in milliseconds (e.g., 2000ms = 2 seconds)--conversation-turn-delay-stddev: Variation in delays to simulate natural human behavior--conversation-turn-delay-ratio: Scaling factor for all delaysConversation Control:
--conversation-num <N> — Number of conversation sessions (for multi-turn)--request-count <N> — Number of requests (for single-turn)--num-dataset-entries <N> — Number of unique prompts to generateTurn Configuration:
--conversation-turn-mean <N> — Average turns per conversation (default: 1)--conversation-turn-stddev <N> — Standard deviation of turns (default: 0)Turn Delays:
--conversation-turn-delay-mean <MS> — Average delay between turns in ms (default: 0)--conversation-turn-delay-stddev <MS> — Standard deviation of delays in ms (default: 0)Best Practices:
--request-rate to control conversation start rate for more predictable load--random-seed for reproducible conversation patternsSee also: