Profile with MMStar Dataset

View as Markdown

AIPerf supports benchmarking using the MMStar dataset, a multimodal visual question answering benchmark that tests fine-grained visual perception and reasoning. Each sample contains an image and a question that requires understanding the image to answer.

This guide covers profiling OpenAI-compatible vision language models using the MMStar public dataset.


Start a vLLM Server

Launch a vLLM server with a vision language model:

$docker pull vllm/vllm-openai:latest
$docker run --gpus all -p 8000:8000 vllm/vllm-openai:latest \
> --model Qwen/Qwen2-VL-2B-Instruct

Verify the server is ready:

$curl -s localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{"model":"Qwen/Qwen2-VL-2B-Instruct","messages":[{"role":"user","content":"test"}],"max_tokens":1}'

Profile with MMStar Dataset

AIPerf loads the MMStar dataset from HuggingFace, attaches the image from each row to the question, and sends each pair as a single-turn vision request.

$aiperf profile \
> --model Qwen/Qwen2-VL-2B-Instruct \
> --endpoint-type chat \
> --streaming \
> --url localhost:8000 \
> --public-dataset mmstar \
> --request-count 10 \
> --concurrency 4

Sample Output (Successful Run):

NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p50 ┃ std ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Time to First │ 36,158.25 │ 10,145.84 │ 68,830.92 │ 68,830.92 │ 68,830.88 │ 28,015.68 │ 22,647.12 │
│ Token (ms) │ │ │ │ │ │ │ │
│ Time to Second │ 8,736.44 │ 89.13 │ 53,118.68 │ 49,851.97 │ 20,451.58 │ 113.97 │ 16,181.79 │
│ Token (ms) │ │ │ │ │ │ │ │
│ Time to First │ 36,158.25 │ 10,145.84 │ 68,830.92 │ 68,830.92 │ 68,830.88 │ 28,015.68 │ 22,647.12 │
│ Output Token │ │ │ │ │ │ │ │
│ (ms) │ │ │ │ │ │ │ │
│ Request Latency │ 70,665.03 │ 10,972.01 │ 164,879.38 │ 158,399.81 │ 100,083.68 │ 65,599.64 │ 40,227.69 │
│ (ms) │ │ │ │ │ │ │ │
│ Inter Token │ 1,445.22 │ 82.62 │ 4,143.54 │ 3,985.99 │ 2,568.02 │ 1,356.06 │ 1,277.29 │
│ Latency (ms) │ │ │ │ │ │ │ │
│ Output Token │ 3.50 │ 0.24 │ 12.10 │ 11.81 │ 9.18 │ 0.98 │ 4.30 │
│ Throughput Per │ │ │ │ │ │ │ │
│ User │ │ │ │ │ │ │ │
│ (tokens/sec/use… │ │ │ │ │ │ │ │
│ Output Sequence │ 26.40 │ 7.00 │ 118.00 │ 110.71 │ 45.10 │ 15.00 │ 31.52 │
│ Length (tokens) │ │ │ │ │ │ │ │
│ Input Sequence │ 41.60 │ 25.00 │ 59.00 │ 58.46 │ 53.60 │ 41.50 │ 11.32 │
│ Length (tokens) │ │ │ │ │ │ │ │
│ Output Token │ 1.47 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
│ Throughput │ │ │ │ │ │ │ │
│ (tokens/sec) │ │ │ │ │ │ │ │
│ Image Throughput │ 0.02 │ 0.01 │ 0.09 │ 0.09 │ 0.03 │ 0.02 │ 0.02 │
│ (images/sec) │ │ │ │ │ │ │ │
│ Image Latency │ 70,665.03 │ 10,972.01 │ 164,879.38 │ 158,399.81 │ 100,083.68 │ 65,599.64 │ 40,227.69 │
│ (ms/image) │ │ │ │ │ │ │ │
│ Request │ 0.06 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
│ Throughput │ │ │ │ │ │ │ │
│ (requests/sec) │ │ │ │ │ │ │ │
│ Request Count │ 10.00 │ N/A │ N/A │ N/A │ N/A │ N/A │ N/A │
│ (requests) │ │ │ │ │ │ │ │
└──────────────────┴───────────┴───────────┴────────────┴────────────┴────────────┴───────────┴───────────┘