Replay pre-formatted multi-turn API payloads from AIPerf’s inputs.json file format.
Every AIPerf benchmark run produces an inputs.json artifact in the output directory. This file captures the exact API request payloads that were sent during the benchmark, organized by session. The inputs_json dataset type reads this file back and replays its payloads verbatim.
The file is a single JSON object with a top-level data array. Each element represents one session with an ordered list of API request payloads.
Each object inside payloads is sent directly to the server without modification. The loader does not inspect or validate payload contents.
After running any AIPerf benchmark, an inputs.json file is generated in the artifact directory. Replay it:
Raw payloads work with any endpoint type. The default chat endpoint provides structured response parsing (token counts, finish reasons). Use --endpoint-type raw only for non-standard APIs where no built-in endpoint matches.
--custom-dataset-type inputs_json is required when replaying AIPerf-generated inputs.json files because AIPerf writes them with pretty-printed formatting (multi-line JSON), which the line-based auto-detection cannot parse. Always specify the dataset type explicitly for reliability.
Run the same payloads against two different servers to compare performance:
Inputs JSON conversations use message_array_with_responses context mode by default. Each turn is sent exactly as written — AIPerf does not accumulate prior turns or inject server responses into subsequent requests.
This is the correct behavior because each payload already contains the complete message history for that point in the conversation.
Both inputs_json and raw_payload send payloads verbatim, but they differ in structure:
Choose inputs_json when you have a structured file with named sessions (especially from a prior AIPerf run). Choose raw_payload when you have flat JSONL logs or a directory of captured conversations.
--custom-dataset-type inputs_json when replaying AIPerf-generated files. Auto-detection uses line-based JSON parsing, which fails on pretty-printed (multi-line) JSON files.--concurrency.inputs.json — this is the file you can feed back for replay.