Profile Embedding Models with AIPerf
AIPerf supports benchmarking embedding models that convert text into dense vector representations.
This guide covers profiling OpenAI-compatible embedding endpoints using vLLM.
Section 1. Profile vLLM Embedding Models
Start a vLLM Embedding Server
Launch a vLLM server with an embedding model:
Verify the server is ready:
Profile with Synthetic Inputs
Run AIPerf against the embeddings endpoint using synthetic inputs:
Sample Output (Successful Run):
Embeddings endpoints return metrics focused on request latency and throughput. No token-level metrics (TTFT, ITL) since embeddings return a single vector per request.
Profile with Custom Input File
Create a JSONL embeddings input file:
Run AIPerf using the custom input file:
Sample Output (Successful Run):
When using custom inputs, AIPerf uses your actual text samples instead of synthetic data. The input sequence lengths will vary based on your actual text content.