Learn how to analyze and generate synthetic traces with controlled prefix-sharing patterns for KV cache benchmarking.
The prefix synthesis feature enables:
In Large Language Models, prefix caching allows reusing previously computed KV cache entries when the same text prefix appears in multiple requests. The prefix synthesis feature helps you:
Analyze an existing trace file to extract statistics:
Output example:
Summary metrics:
Percentile statistics (computed for ISL, OSL, context length, unique prompt length, and hit rate):
Percentiles are calculated using linear interpolation: for percentile p with n sorted values, compute index k = (n - 1) * p, then interpolate between values[floor(k)] and values[ceil(k)].
These metrics help you understand how much prefix caching could benefit your workload.
Synthesis happens automatically when you run aiperf profile with mooncake traces and synthesis parameters. The trace is transformed in-memory before benchmarking:
This runs a benchmark using the original trace characteristics. Adjust the multipliers to scale different aspects.
Sample Output (Successful Run):
--synthesis-speedup-ratio (default: 1.0)Scale timestamps to simulate faster or slower request rates:
1.0: No change, request times identical2.0: 2x faster (timestamps halved)0.5: 2x slower (timestamps doubled)Example: Simulate 2x more concurrent load:
--synthesis-prefix-len-multiplier (default: 1.0)Scale the length of core prefix paths (shared prefixes):
1.0: No change1.5: Extend shared prefixes by 50%0.5: Reduce shared prefixes by 50%Example: Simulate longer context windows:
--synthesis-prefix-root-multiplier (default: 1)Distribute traces across N independent radix trees:
1: All traces share the same prefix tree (default)2: Traces randomly assigned to 2 independent trees (50% each)3: Traces randomly assigned to 3 independent trees (33% each)Each tree has identical structure but different hash IDs, so traces in different trees cannot share prefixes. This reduces the effective cache hit rate by splitting the workload.
Example: Simulate lower cache hit rates with more diverse prefix roots:
--synthesis-prompt-len-multiplier (default: 1.0)Scale the length of unique prompts (non-shared portions):
1.0: No change2.0: Double unique prompt lengths0.5: Halve unique prompt lengthsExample: Simulate shorter user prompts:
--synthesis-max-isl (optional)Filter traces by maximum input sequence length. Traces with input_length > max_isl are skipped:
4096: Skip traces with more than 4,096 input tokensExample: Filter out long contexts:
--synthesis-max-osl (optional)Cap traces to a maximum output sequence length. Traces with output_length > max_osl are capped to max_osl:
2048: Cap output_length to 2,048 tokensExample: Cap output lengths to 2,048 tokens:
Analyze original traces to understand their cache characteristics, then benchmark with boosted prefix reuse:
Compress timestamps to simulate 10x faster request rate:
Benchmark with longer contexts while maintaining prefix patterns:
Benchmark with more diverse prefix patterns for multi-turn scenarios:
The mooncake trace format is JSONL (JSON Lines), where each line is a JSON object representing one request:
Required fields:
input_length: Number of input tokensOptional fields:
output_length: Expected output tokenstimestamp: Absolute timestamp in milliseconds (for fixed schedules)hash_ids: List of hash IDs representing prefix blockssession_id: Conversation/session identifier for multi-turndelay: Milliseconds to wait before sending (for multi-turn)Always run analyze-trace first to understand your data:
Test parameters incrementally rather than changing everything at once:
Run benchmarks with different synthesis parameters to compare:
The synthesis preserves statistical properties. For best results:
Solution: Verify the file path is correct:
Solution: Your trace file doesn’t have hash_ids. Synthesis will still work with input_length and output_length fields, but prefix caching information won’t be available.
Solution: Your workload has low prefix reuse. Try:
--synthesis-prefix-len-multiplier to extend shared prefixes--synthesis-prefix-root-multiplier to create more diverse patterns