Profile with ShareGPT Dataset

AIPerf supports benchmarking using the ShareGPT dataset, which contains real conversational data from user interactions.

This guide covers profiling OpenAI-compatible chat completions endpoints using the ShareGPT public dataset.

Start a vLLM Server

Launch a vLLM server with a chat model:

$ docker pull vllm/vllm-openai:latest
$ docker run --gpus all -p 8000:8000 vllm/vllm-openai:latest \
>   --model Qwen/Qwen3-0.6B

Verify the server is ready:

$ curl -s localhost:8000/v1/chat/completions \
>   -H "Content-Type: application/json" \
>   -d '{"model":"Qwen/Qwen3-0.6B","messages":[{"role":"user","content":"test"}],"max_tokens":1}'

AIPerf automatically downloads and caches the ShareGPT dataset from HuggingFace.

$ aiperf profile \
>     --model Qwen/Qwen3-0.6B \
>     --endpoint-type chat \
>     --streaming \
>     --url localhost:8000 \
>     --public-dataset sharegpt \
>     --request-count 20 \
>     --concurrency 4

Sample Output (Successful Run):

INFO     Starting AIPerf System
INFO     Downloading ShareGPT dataset from HuggingFace
INFO     Cached ShareGPT dataset loaded
INFO     AIPerf System is PROFILING
Profiling: 20/20 |████████████████████████| 100% [00:45<00:00]
INFO     Benchmark completed successfully
INFO     Results saved to: artifacts/Qwen_Qwen3-0.6B-chat-concurrency4/
            NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃                      Metric ┃     avg ┃     min ┃     max ┃     p99 ┃     p50 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│        Request Latency (ms) │ 1456.78 │ 1089.34 │ 1978.90 │ 1898.45 │ 1423.67 │
│    Time to First Token (ms) │  267.89 │  198.34 │  389.12 │  367.45 │  262.12 │
│    Inter Token Latency (ms) │   13.45 │   10.67 │   18.90 │   17.89 │   13.12 │
│ Output Token Count (tokens) │  187.00 │  142.00 │  245.00 │  239.00 │  184.00 │
│  Request Throughput (req/s) │    8.45 │       - │       - │       - │       - │
└─────────────────────────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
JSON Export: artifacts/Qwen_Qwen3-0.6B-chat-concurrency4/profile_export_aiperf.json