Agentic Code Dataset Generator
The Agentic Code dataset generator creates synthetic multi-turn coding-agent traces for long-context and KV-cache benchmarking. It models shared prompt layers, session-specific repository context, incremental conversation growth, inter-turn delays, resets, and restart continuations.
The generator writes Mooncake trace JSONL, so the output can be replayed with
the existing mooncake_trace custom dataset loader.
Prefix Layers
Agentic Code traces divide each session’s prompt into cache-reuse layers:
- L1: global tools and system prompt. These blocks are identical across all sessions and model globally reusable KV cache.
- L1.5: group-shared repository instructions and context. These blocks are shared by sessions in the same group, but differ across groups.
- L2: session-specific starting context, such as initially opened files. These blocks are unique to a session at turn 0.
- L3: conversation history added after turn 0. This layer grows as the session continues and is unique to that session.
Probabilistic resets and forced retires end a session; the next primary session gets fresh L2 and L3 blocks while still reusing any shared L1 and L1.5 blocks. Restart continuations are different: they split one logical run into Session A and Session B, and Session B carries the accumulated context and hash IDs from Session A so cache reuse is preserved across the split.
Turns, Resets, and Restarts
The generator has two turn-management modes.
Reset-Driven Mode
Reset-driven mode is the default. The generator does not choose a fixed turn count up front. Instead, each session grows turn by turn until one of the end conditions fires.
Turn construction works as follows:
- Turn 0 samples the initial context:
L1 + L1.5 + sampled L2. - Turn 0 has
delay_ms = 0andtimestamp_ms = 0. - Later turns sample an inter-turn delay from the agentic/human delay mixture.
- Later turns sample
new_tokens_per_turn. - The cumulative in-memory input length is:
previous_input + previous_output + new_tokens. - Accepted turns sample
generation_lengthfor output tokens and extend the session’s L3 hash IDs.
The JSONL output stores incremental turn input in input_length, even though
the in-memory SynthesizedTurn.input_length is cumulative. This is the Mooncake
trace format expected by AIPerf replay.
Reset-driven sessions can end in these ways:
- Forced retire: the next candidate turn would reach or exceed
max_prompt_tokens. The overflowing turn is not added. - Probabilistic reset: after the context-limit check, the generator applies:
p = base_probability * (1 + (context_scaling - 1) * input_length / max_prompt_tokens). If the draw succeeds, the session ends before adding that candidate turn. - Restart split: if restart injection is enabled for that primary session,
the session splits at a sampled turn index from
restart_turn_range.
Restart splits are controlled by restart_initial_probability and
restart_turn_range. The restart probability decays linearly to zero over the
first 75% of primary sessions. When a split occurs:
- Session A ends with
restart_split. - Session B gets a new
session_id, keeps the samegroup_id, and is marked withis_restarton its first JSONL row. - Session B starts from Session A’s accumulated input/output context and carries forward the same hash IDs.
- Session B is inserted later in the generated session order so it does not immediately overlap with Session A in the same concurrency window.
Explicit Turn-Count Mode
If the config sets turns, the generator switches to explicit turn-count mode.
In this mode it samples a target number of turns from the turns distribution
and attempts to build a session with exactly that many turns.
Explicit turn-count mode cannot be combined with reset or
restart_initial_probability; config validation rejects that combination. A
session that reaches the sampled target ends with target_turn_count.
If the sampled session would hit max_prompt_tokens before reaching the target:
- With
allow_truncation: false, the generator retries the whole session up tomax_session_attempts, then raises an error if it still cannot fit. - With
allow_truncation: true, the generator returns the partial session and marks it asforced_retire.
Generate a Dataset
Create a dataset with the built-in default configuration:
Each run creates a timestamped directory:
The directory contains:
dataset.jsonl: Mooncake-compatible trace rows.manifest.json: seed, session count, config name, and generation parameters.quality.json: target-vs-observed distribution statistics.report.html: summary dashboard for generated sessions.cache_explorer.html: KV block reuse inspection view.simulation.html: browser-based KV cache pressure simulation.
synthesize agentic-code validates the generated dataset.jsonl before it
prints the run summary. You can also validate a saved or edited trace directly:
Replay With AIPerf
Use the generated dataset.jsonl as a Mooncake trace:
For longer runs, use the same generated trace with the usual Mooncake replay controls:
Dataset Format
dataset.jsonl contains one JSON object per request turn:
Important fields:
session_id: logical conversation identifier.input_length: new input tokens for this turn. Turn 0 includes the initial cached prefix; later turns contain only incremental tokens.output_length: generated output tokens for the turn.hash_ids: KV-cache block IDs for the new input tokens.timestamp: absolute start time in milliseconds for turn 0.delay: delay in milliseconds before a later turn in the same session.group_id: shared-prefix group, emitted on turn 0.is_restart: present on turn 0 when the session continues from an earlier split.
Configuration
Pass a bundled config name, a config JSON path, or a prior run manifest.
Currently, the only bundled runnable config is default.
The default config models long coding-agent sessions with:
max_prompt_tokens:167000.block_size:512tokens.- A
32000token global L1 prefix shared by all sessions. - No L1.5 group-shared prefix by default (
layer1_5_tokens: 0,num_groups: 1). - Session-specific initial context sampled around a
15000token mean. - New turn input sampled around a
6000token mean, capped at10000. - Output length sampled around a
1000token mean, capped at1500. - A small reset probability that grows with context utilization.
Use --max-isl and --max-osl for quick sequence-length overrides:
The config schema is generated at
src/aiperf/dataset/agentic_code_gen/configs/spec.json.
Related Tutorials
- Trace Benchmarking - deterministic trace replay.
- Prefix Synthesis - KV cache testing with shared prefixes.
- Fixed Schedule - timestamp-based execution.
- Multi-Turn Conversations - session replay and conversation state.