***

sidebar-title: Profile with AIMO Dataset
---------------------

For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://docs.nvidia.com/aiperf/tutorials/datasets-inputs/llms.txt. For full documentation content, see https://docs.nvidia.com/aiperf/tutorials/datasets-inputs/llms-full.txt.

# Profile with AIMO Dataset

AIPerf supports benchmarking using AIMO math reasoning datasets, which contain competition
mathematics problems requiring chain-of-thought reasoning. These datasets are useful for measuring
model throughput and latency under long-context, reasoning-heavy workloads.

Four variants are available:

| Dataset | `--public-dataset` | Description |
|---|---|---|
| NuminaMath-TIR | `aimo` | Tool-integrated reasoning problems |
| NuminaMath-CoT | `aimo_numina_cot` | Chain-of-thought reasoning problems (~859k) |
| NuminaMath-1.5 | `aimo_numina_1_5` | Latest NuminaMath release |
| AIME Validation | `aimo_aime` | 90 AIME competition problems |

This guide covers profiling OpenAI-compatible chat completions endpoints using the AIMO public
datasets.

---

## Start a vLLM Server

Launch a vLLM server with a chat model:

```bash
docker pull vllm/vllm-openai:latest
docker run --gpus all -p 8000:8000 vllm/vllm-openai:latest \
  --model Qwen/Qwen3-0.6B
```

Verify the server is ready:
```bash
curl -s localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen/Qwen3-0.6B","messages":[{"role":"user","content":"test"}],"max_tokens":1}'
```

---

## Profile with AIMO Dataset

AIPerf loads the AIMO dataset from HuggingFace and uses each problem as a single-turn prompt.

AIMO problems elicit long chain-of-thought responses. Use `--prompt-output-tokens-mean` to cap
output length and reduce benchmark duration:

```bash
aiperf profile \
    --model Qwen/Qwen3-0.6B \
    --endpoint-type chat \
    --streaming \
    --url localhost:8000 \
    --public-dataset aimo \
    --request-count 10 \
    --concurrency 4 \
    --prompt-output-tokens-mean 512
```

**Sample Output (Successful Run):**

```

                                        NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃               Metric ┃       avg ┃       min ┃       max ┃       p99 ┃       p90 ┃       p50 ┃      std ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│  Time to First Token │  1,509.04 │    687.93 │  1,879.53 │  1,879.53 │  1,879.51 │  1,563.66 │   430.29 │
│                 (ms) │           │           │           │           │           │           │          │
│ Time to Second Token │     56.44 │     32.27 │     86.48 │     84.59 │     67.58 │     53.16 │    14.62 │
│                 (ms) │           │           │           │           │           │           │          │
│ Request Latency (ms) │ 32,823.02 │ 18,943.44 │ 38,431.19 │ 38,431.18 │ 38,431.07 │ 34,163.15 │ 7,192.79 │
│  Inter Token Latency │     61.40 │     35.80 │     71.72 │     71.72 │     71.67 │     63.92 │    13.26 │
│                 (ms) │           │           │           │           │           │           │          │
│         Output Token │     17.42 │     13.94 │     27.94 │     27.94 │     27.93 │     15.64 │     5.31 │
│  Throughput Per User │           │           │           │           │           │           │          │
│    (tokens/sec/user) │           │           │           │           │           │           │          │
│      Output Sequence │    511.00 │    511.00 │    511.00 │    511.00 │    511.00 │    511.00 │     0.00 │
│      Length (tokens) │           │           │           │           │           │           │          │
│       Input Sequence │     56.90 │     31.00 │     84.00 │     83.46 │     78.60 │     53.00 │    17.59 │
│      Length (tokens) │           │           │           │           │           │           │          │
│         Output Token │     55.82 │       N/A │       N/A │       N/A │       N/A │       N/A │      N/A │
│           Throughput │           │           │           │           │           │           │          │
│         (tokens/sec) │           │           │           │           │           │           │          │
│   Request Throughput │      0.11 │       N/A │       N/A │       N/A │       N/A │       N/A │      N/A │
│       (requests/sec) │           │           │           │           │           │           │          │
│        Request Count │     10.00 │       N/A │       N/A │       N/A │       N/A │       N/A │      N/A │
│           (requests) │           │           │           │           │           │           │          │
└──────────────────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴──────────┘
```


> Higher request latency compared to conversational datasets is expected — AIMO problems require
> extended chain-of-thought reasoning and produce significantly longer outputs.