Prefix Synthesis API Reference | NVIDIA AIPerf Documentation

Complete API documentation for the prefix synthesis module.

Module: `aiperf.dataset.synthesis`

Classes

`RollingHasher`

Converts sequences of text blocks into globally unique hash IDs using rolling hash.

Constructor:

1 RollingHasher(block_size: int = 512) -> None

Parameters:

block_size (int): Number of tokens per block for hashing (default: 512)

Methods:

hash_blocks(blocks: Sequence[str]) -> list[int]

Convert a sequence of text blocks to hash IDs
Parameters:
- blocks: Sequence of text strings representing blocks
Returns: List of unique hash IDs

hash_token_blocks(blocks: Sequence[Sequence[int]]) -> list[int]

Convert a sequence of token blocks to hash IDs
Parameters:
- blocks: Sequence of token blocks (each block is a sequence of token IDs)
Returns: List of unique hash IDs

reset() -> None

Reset the hasher state for hashing new sequences
Note: Maintains global uniqueness across sequences (keeps _hash_to_id and _id_counter)

get_stats() -> dict[str, int]

Get statistics about the hasher
Returns: Dictionary with ‘total_hashes’ and ‘max_id’

Example:

1 from aiperf.dataset.synthesis import RollingHasher
2 
3 hasher = RollingHasher(block_size=512)
4 hash_ids = hasher.hash_blocks(["hello", "world", "test"])
5 # Result: [0, 1, 2]
6 
7 stats = hasher.get_stats()
8 # {'total_hashes': 3, 'max_id': 2}

`RadixTree`

Compact representation of prefix patterns using a radix tree data structure.

Constructor:

1 RadixTree() -> None

Methods:

add_path(path: list[int]) -> RadixNode

Add a path to the tree from root
Parameters:
- path: List of edge labels (hash IDs or token counts)
Returns: The leaf RadixNode at the end of the path
Side effects: Increments visit_count for all nodes in the path

get_node(node_id: int) -> RadixNode | None

Get node by ID
Returns: RadixNode or None if not found

get_all_nodes() -> list[RadixNode]

Get all nodes in the tree
Returns: List of all RadixNode instances

get_stats() -> RadixTreeStats

Get statistics about tree structure
Returns: RadixTreeStats object with ‘num_nodes’, ‘num_leaves’, ‘total_visits’, ‘max_depth’

Properties:

root: RadixNode

The root node of the tree (read-only)

Example:

1 from aiperf.dataset.synthesis import RadixTree
2 
3 tree = RadixTree()
4 
5 # Add paths
6 tree.add_path([1, 2, 3])
7 tree.add_path([1, 2, 4])
8 tree.add_path([1, 5, 6])
9 
10 # Get statistics
11 stats = tree.get_stats()
12 # {
13 #   'num_nodes': 7,
14 #   'num_leaves': 3,
15 #   'total_visits': 3,
16 #   'max_depth': 3
17 # }

`RadixNode`

A node in the radix tree representing a prefix path.

Constructor:

1 RadixNode(
2     node_id: int,
3     label: int | None = None,
4     visit_count: int = 0,
5     children: dict[int, RadixNode] | None = None,
6     parent: RadixNode | None = None
7 ) -> None

Attributes:

node_id: Unique node identifier
label: Edge label (token count)
visit_count: Number of times this node is visited
children: Dictionary of child nodes by edge label
parent: Parent node reference

Methods:

add_child(label: int, child: RadixNode) -> None

Add a child node with the given edge label

get_child(label: int) -> RadixNode | None

Get child with given label

is_leaf() -> bool

Check if this node is a leaf (no children)

`EmpiricalSampler`

Samples values from an empirical distribution learned from data.

Constructor:

1 EmpiricalSampler(data: list[int] | list[float]) -> None

Parameters:

data: List of observed values to learn distribution from

Methods:

sample() -> int | float

Draw a single sample from the learned distribution
Returns: A sampled value

sample_batch(size: int) -> list[int | float]

Draw multiple samples from the learned distribution
Parameters:
- size: Number of samples to draw
Returns: List of sampled values

get_stats() -> EmpiricalSamplerStats

Get statistics about the learned distribution
Returns: EmpiricalSamplerStats object with ‘min’, ‘max’, ‘mean’, ‘median’, ‘num_unique’

Example:

1 from aiperf.dataset.synthesis import EmpiricalSampler, EmpiricalSamplerStats
2 
3 # Create sampler from observed data
4 data = [100, 200, 150, 300, 250, 100, 200]
5 sampler = EmpiricalSampler(data)
6 
7 # Sample from distribution
8 sample = sampler.sample()  # Returns a value like 100, 150, 200, 250, or 300
9 
10 # Get distribution statistics
11 stats = sampler.get_stats()
12 # EmpiricalSamplerStats(min=100.0, max=300.0, mean=185.7, median=200.0, num_unique=5)

`PrefixAnalyzer`

Analyzes traces to extract ISL/OSL statistics and prefix patterns.

Constructor:

1 PrefixAnalyzer(block_size: int = 512) -> None

Parameters:

block_size (int): Number of tokens per block for analysis (default: 512)

Methods:

analyze_file(trace_file: Path | str) -> AnalysisStats

Analyze a mooncake trace file
Parameters:
- trace_file: Path to JSONL trace file
Returns: AnalysisStats object

analyze_traces(traces: list[dict]) -> AnalysisStats

Analyze a list of trace dictionaries
Parameters:
- traces: List of trace dictionaries
Returns: AnalysisStats object

Example:

1 from aiperf.dataset.synthesis import PrefixAnalyzer
2 from pathlib import Path
3 
4 analyzer = PrefixAnalyzer(block_size=512)
5 
6 # Analyze from file
7 stats = analyzer.analyze_file("traces/production.jsonl")
8 
9 print(f"Total requests: {stats.total_requests}")
10 print(f"Cache hit rate: {stats.cache_hit_rate:.2%}")
11 print(f"Average ISL: {stats.avg_isl:.1f}")

`Synthesizer`

Generates synthetic traces preserving prefix-sharing patterns.

Constructor:

1 Synthesizer(params: SynthesisParams | None = None) -> None

Parameters:

params (SynthesisParams): Generation configuration (optional, uses defaults if None)

Methods:

synthesize_from_file(trace_file: Path | str) -> list[dict]

Synthesize traces from an input trace file
Parameters:
- trace_file: Path to input JSONL trace file
Returns: List of synthetic trace dictionaries

synthesize_traces(traces: list[dict]) -> list[dict]

Synthesize traces from a list of trace dictionaries
Parameters:
- traces: List of input trace dictionaries
Returns: List of synthetic trace dictionaries

synthesize_grouped_traces(data: dict[str, list[dict]]) -> dict[str, list[dict]]

Synthesize traces while preserving session grouping
Parameters:
- data: Dictionary mapping session_id to list of trace dicts
Returns: Dictionary mapping session_id to list of synthesized trace dicts

get_stats() -> dict[str, Any]

Get synthesizer statistics
Returns: Dictionary with ‘tree_nodes’, ‘tree_depth’, ‘params’

Example:

1 from aiperf.dataset.synthesis import Synthesizer
2 from aiperf.dataset.synthesis.models import SynthesisParams
3 
4 # Create synthesizer with custom parameters
5 params = SynthesisParams(
6     speedup_ratio=2.0,
7     prefix_len_multiplier=1.5,
8     max_isl=4096
9 )
10 synthesizer = Synthesizer(params=params)
11 
12 # Synthesize from file
13 synthetic_traces = synthesizer.synthesize_from_file("input.jsonl")
14 
15 # Synthesize from list
16 synthetic_traces = synthesizer.synthesize_traces([
17     {"input_length": 512, "output_length": 64, "hash_ids": [1, 2, 3]},
18     {"input_length": 768, "output_length": 128, "hash_ids": [1, 2, 4]},
19 ])
20 
21 # Get statistics
22 stats = synthesizer.get_stats()
23 print(f"Tree nodes: {stats['tree_nodes']}")

Data Models

`AnalysisStats`

Statistics extracted from trace analysis.

Fields:

total_requests: int - Total number of requests in trace
unique_prefixes: int - Number of unique prefix patterns (all prefix subsequences)
num_prefix_groups: int - Number of distinct shared first blocks (prefix groups)
cache_hit_rate: float - Theoretical cache hit rate (0.0 to 1.0)
min_isl: int - Minimum input sequence length
max_isl: int - Maximum input sequence length
avg_isl: float - Average input sequence length
min_osl: int - Minimum output sequence length
max_osl: int - Maximum output sequence length
avg_osl: float - Average output sequence length
prefix_reuse_ratio: float - Ratio of reused prefixes (0.0 to 1.0)
isl_stats: MetricStats | None - Full statistics for input sequence length
osl_stats: MetricStats | None - Full statistics for output sequence length
context_length_stats: MetricStats | None - Full statistics for context (shared prefix) length
unique_prompt_length_stats: MetricStats | None - Full statistics for unique prompt length
hit_rate_stats: MetricStats | None - Full statistics for per-request cache hit rates

`MetricStats`

Statistics for a single metric with percentiles.

Fields:

mean: float - Mean value
std_dev: float - Standard deviation
min: float - Minimum value
p25: float - 25th percentile
median: float - Median (50th percentile)
p75: float - 75th percentile
max: float - Maximum value

`SynthesisParams`

Parameters for synthetic trace generation.

Fields:

speedup_ratio: float = 1.0 - Timestamp scaling multiplier (ge 0.0)
prefix_len_multiplier: float = 1.0 - Core prefix length multiplier (ge 0.0)
prefix_root_multiplier: int = 1 - Number of independent trees to distribute traces across (ge 1)
prompt_len_multiplier: float = 1.0 - Leaf prompt length multiplier (ge 0.0)
output_len_multiplier: float = 1.0 - Output length multiplier (ge 0.0)
max_isl: int | None = None - Maximum input sequence length filter
max_osl: int | None = None - Maximum output sequence length cap
block_size: int = 512 - KV cache page size (ge 1)

Class Methods:

from_synthesis_config(config: SynthesisConfig, block_size: int = 512) -> SynthesisParams - Create from SynthesisConfig

`RadixTreeStats`

Statistics about radix tree structure.

Fields:

num_nodes: int - Total number of nodes in tree
num_leaves: int - Number of leaf nodes (nodes with no children)
total_visits: int - Number of paths added to tree
max_depth: int - Maximum depth from root to leaf

`EmpiricalSamplerStats`

Statistics about learned empirical distribution.

Fields:

min: float - Minimum value in original data
max: float - Maximum value in original data
mean: float - Mean of original data
median: float - Median of original data
num_unique: int - Number of unique values in distribution

Utility Functions

`aiperf.dataset.synthesis.graph_utils`

Graph manipulation utilities for radix tree operations.

validate_tree(tree: RadixTree) -> bool

Validate tree structure consistency
Checks parent-child relationships and reachability

remove_leaves(tree: RadixTree, visit_threshold: int = 1) -> None

Remove leaf nodes visited N times or less
Parameters:
- tree: RadixTree to prune
- visit_threshold: Minimum visit count to keep

merge_unary_chains(tree: RadixTree) -> None

Merge unary chains (nodes with single children) into compressed edges
Modifies tree in-place

compute_transition_cdfs(tree: RadixTree) -> dict[int, np.ndarray]

Compute cumulative distribution functions for outgoing transitions
Returns: Dictionary mapping node IDs to CDF arrays

get_tree_stats(tree: RadixTree) -> dict[str, Any]

Get comprehensive statistics about tree structure
Returns: Dictionary with tree metrics (includes RadixTreeStats fields plus ‘internal_nodes’ and ‘branching_factor’)

`aiperf.dataset.synthesis.rolling_hasher`

Utility functions for converting between texts and hash IDs.

texts_to_hashes(tokenizer: Tokenizer, texts: list[str], block_size: int = 512) -> list[list[int]]

Convert a list of texts to hash ID sequences
Tokenizes texts, splits into blocks, and generates consecutive hash IDs
Parameters:
- tokenizer: Tokenizer for encoding texts
- texts: List of input text strings
- block_size: Number of tokens per block
Returns: List of hash ID sequences, one per input text

hashes_to_texts(prompt_generator: PromptGenerator, hash_ids_list: list[list[int]], input_lengths: list[int], block_size: int = 512) -> list[str]

Convert hash ID sequences back to text strings
Uses the PromptGenerator’s cache to ensure the same hash ID always produces the same token block
Parameters:
- prompt_generator: PromptGenerator instance for generating text from hash_ids
- hash_ids_list: List of hash ID sequences
- input_lengths: Target input lengths (in tokens) for each sequence
- block_size: Number of tokens per block
Returns: List of text strings, one per hash ID sequence
Raises: ValueError if len(hash_ids) * block_size < input_length for any sequence

CLI Commands

`aiperf analyze-trace`

Analyze a mooncake trace file for ISL/OSL distributions and cache hit rates.

Usage:

$ aiperf analyze-trace INPUT_FILE [OPTIONS]

Arguments:

INPUT_FILE: Path to input mooncake trace JSONL file

Options:

--block-size INT (default: 512) - KV cache block size
--output-file PATH - Optional output path for analysis report (JSON)

Example:

$ aiperf analyze-trace traces/prod.jsonl --output-file analysis.json

Configuration

Synthesis parameters can be configured via CLI or programmatically.

Via CLI Options (with aiperf profile):

Synthesis is applied automatically when running aiperf profile with mooncake traces and synthesis parameters:

$ aiperf profile \
>     --input-file traces/production.jsonl \
>     --custom-dataset-type mooncake_trace \
>     --synthesis-speedup-ratio 2.0 \
>     --synthesis-prefix-len-multiplier 1.5 \
>     --synthesis-output-len-multiplier 2.0 \
>     --synthesis-max-isl 4096 \
>     --model Qwen/Qwen3-0.6B \
>     --endpoint-type chat

Via SynthesisConfig:

1 from aiperf.config.flags._input import SynthesisConfig
2 
3 config = SynthesisConfig(
4     speedup_ratio=2.0,
5     prefix_len_multiplier=1.5,
6     output_len_multiplier=2.0,
7     max_isl=4096,
8 )
9 
10 # Check if synthesis would be triggered
11 if config.should_synthesize():
12     print("Synthesis will be applied")

Direct Instantiation:

1 from aiperf.dataset.synthesis import Synthesizer
2 from aiperf.dataset.synthesis.models import SynthesisParams
3 
4 params = SynthesisParams(
5     speedup_ratio=2.0,
6     prefix_len_multiplier=1.5,
7     output_len_multiplier=2.0,
8     max_isl=4096,
9 )
10 
11 synthesizer = Synthesizer(params=params)
12 synthetic_traces = synthesizer.synthesize_from_file("input.jsonl")

Integration with AIPerf Benchmark

Synthesis is applied automatically during benchmark when using mooncake traces with synthesis parameters:

$ # Run benchmark with synthesis applied in-memory
$ aiperf profile \
>     --input-file production.jsonl \
>     --custom-dataset-type mooncake_trace \
>     --synthesis-speedup-ratio 2.0 \
>     --synthesis-prefix-len-multiplier 1.5 \
>     --synthesis-output-len-multiplier 2.0 \
>     --model Qwen/Qwen3-0.6B \
>     --endpoint-type chat

Synthesis is triggered automatically when a synthesis transformation parameter differs from its default value. --synthesis-max-isl and --synthesis-max-osl are filters/caps and do not trigger synthesis by themselves.

Performance Considerations

Memory: Radix tree construction can be memory-intensive for large traces (>100k requests)
Time: Analysis is O(n) where n is number of requests
Synthesis: O(n) to generate synthetic traces from analyzed data
Optimization: Use remove_leaves() to prune infrequent paths for memory savings

Error Handling

All components use standard Python exceptions:

1 try:
2     stats = analyzer.analyze_file("nonexistent.jsonl")
3 except FileNotFoundError:
4     print("Trace file not found")
5 except ValueError as e:
6     print(f"Invalid trace format: {e}")

Module: aiperf.dataset.synthesis

Classes

RollingHasher

RadixTree

RadixNode

EmpiricalSampler

PrefixAnalyzer

Synthesizer

Data Models

AnalysisStats

MetricStats

SynthesisParams

RadixTreeStats

EmpiricalSamplerStats

Utility Functions

aiperf.dataset.synthesis.graph_utils

aiperf.dataset.synthesis.rolling_hasher

CLI Commands

aiperf analyze-trace

Configuration

Integration with AIPerf Benchmark

Performance Considerations

Error Handling

See Also

Module: `aiperf.dataset.synthesis`

`RollingHasher`

`RadixTree`

`RadixNode`

`EmpiricalSampler`

`PrefixAnalyzer`

`Synthesizer`

`AnalysisStats`

`MetricStats`

`SynthesisParams`

`RadixTreeStats`

`EmpiricalSamplerStats`

`aiperf.dataset.synthesis.graph_utils`

`aiperf.dataset.synthesis.rolling_hasher`

`aiperf analyze-trace`