Conversation Context Mode
Conversation context mode controls how prior turns are accumulated when building multi-turn chat requests. Different dataset formats imply different accumulation strategies, and AIPerf automatically selects the right one based on your data.
Two dimensions determine the mode:
- Turn format:
DELTAS(incremental per-turn content) vsMESSAGE_ARRAY(each turn carries its complete message list) - Response inclusion:
WITH_RESPONSES(pre-canned assistant turns in dataset) vsWITHOUT_RESPONSES(only user content; live responses captured at runtime)
Modes
deltas_without_responses
Standard multi-turn chat. Each dataset turn is a user-only delta. AIPerf accumulates turns and threads live inference responses into the history.
Dataset:
Replay:
Default for:
- Synthetic datasets
- Multi-turn JSONL
- ShareGPT
- Mooncake traces with
hash_ids
deltas_with_responses
Delta-compressed prompts. Each dataset turn only contains the new messages since the previous turn. AIPerf accumulates these deltas to reconstruct the full conversation. The live inference response is only used for measurement and discarded — the pre-canned assistant responses in the dataset are used instead.
Dataset (each turn is a delta):
Replay (deltas accumulated):
Default for:
- N/A (no built-in loader defaults to this mode yet)
message_array_with_responses
Self-contained prompts. Each turn already contains its full context (including assistant responses). No session accumulation.
Dataset:
Replay:
Each turn is sent exactly as it appears in the dataset.
Default for:
- Mooncake traces with pre-built
messagesarrays
message_array_without_responses
Reserved for future use. Each turn would carry a complete user-only message array, requiring live response merging between turns. Not yet implemented.
How It Works
Context mode is resolved through a priority chain:
- Per-conversation override — A conversation in the dataset can specify its own
context_mode - Loader default — The dataset loader can declare a default based on dataset format semantics
- Global fallback —
deltas_without_responses
This means most users never need to think about context mode. The loader picks the right default, and individual conversations can override it when needed.