Benchmark Datasets

View as Markdown

This document describes datasets that AIPerf can use to generate stimulus. Additional support is under development, so check back often.

Dataset Options

DatasetSupportData Source
Synthetic TextSynthetically generated text prompts pulled from Shakespeare
Synthetic AudioSynthetically generated audio samples
Synthetic ImagesSynthetically generated image samples
Custom Data—input-file your_file.jsonl —custom-dataset-type single_turn
MooncakeMooncake trace file —input-file your_trace_file.jsonl —custom-dataset-type mooncake_trace
ShareGPTConversations from —public-dataset sharegpt
Agentic CodeSynthetic multi-turn coding-agent traces with shared prompt layers, repository context, and cache-aware turn growth. Generated via aiperf synthesize agentic-code and replayed as a Mooncake trace.