Profile with AIMO Dataset
AIPerf supports benchmarking using AIMO math reasoning datasets, which contain competition mathematics problems requiring chain-of-thought reasoning. These datasets are useful for measuring model throughput and latency under long-context, reasoning-heavy workloads.
Four variants are available:
This guide covers profiling OpenAI-compatible chat completions endpoints using the AIMO public datasets.
Start a vLLM Server
Launch a vLLM server with a chat model:
Verify the server is ready:
Profile with AIMO Dataset
AIPerf loads the AIMO dataset from HuggingFace and uses each problem as a single-turn prompt.
AIMO problems elicit long chain-of-thought responses. Use --prompt-output-tokens-mean to cap
output length and reduce benchmark duration:
Sample Output (Successful Run):
Higher request latency compared to conversational datasets is expected — AIMO problems require extended chain-of-thought reasoning and produce significantly longer outputs.