Profile with VisionArena Dataset
AIPerf supports benchmarking using the VisionArena dataset, a collection of real-world conversations between users and vision language models gathered from Chatbot Arena. Each sample contains a real user image and question, covering tasks like captioning, OCR, diagram interpretation, and visual reasoning.
This guide covers profiling OpenAI-compatible vision language models using the VisionArena public dataset.
Note: VisionArena requires HuggingFace authentication. Set your
HF_TOKENenvironment variable before running.
Start a vLLM Server
Launch a vLLM server with a vision language model:
Verify the server is ready:
Profile with VisionArena Dataset
Sample Output (Successful Run):
Higher input sequence length compared to text-only datasets is expected — each request includes an encoded image alongside the question text.