AIPerf supports benchmarking Vision Language Models (VLMs) that process both text and images.
This guide covers profiling vision models using OpenAI-compatible chat completions endpoints with vLLM.
Launch a vLLM server with a vision language model:
Verify the server is ready:
AIPerf can generate synthetic images for benchmarking. By default, images are generated as random noise at the requested dimensions — no on-disk assets required, and the pool is effectively unbounded so servers cannot dedupe on identical inputs. Pass --image-source assets to instead sample and resize the 4 bundled natural images (smaller payload bytes), or --image-source <path> to sample from your own directory.
Sample Output (Successful Run):
Create a JSONL file with text prompts and image URLs:
Run AIPerf using the custom input file:
Sample Output (Successful Run):