Benchmark Datasets

View as Markdown

This document describes datasets that AIPerf can use to generate stimulus. Additional support is under development, so check back often.

Dataset Options

DatasetSupportData Source
Synthetic TextSynthetically generated text prompts pulled from Shakespeare
Synthetic AudioSynthetically generated audio samples
Synthetic ImagesSynthetically generated image samples
Custom Data—input-file your_file.jsonl —custom-dataset-type single_turn
MooncakeMooncake trace file —input-file your_trace_file.jsonl —custom-dataset-type mooncake_trace
ShareGPTConversations from —public-dataset sharegpt