Profile Ranking Models with AIPerf
AIPerf supports benchmarking ranking and reranking models, including those served through Hugging Face Text Embeddings Inference (TEI) or Cohere Re-Rank APIs. These models take a query and one or more passages, returning a similarity or relevance score.
Section 1. Profile Hugging Face TEI Re-Rank Models
Start a Hugging Face TEI Server
Launch a Hugging Face Text Embeddings Inference (TEI) container in re-ranker mode:
Profile using Synthetic Inputs
Run AIPerf using the following command:
Sample Output (Successful Run):
The rankings-specific token options cannot be used together with --prompt-input-tokens-mean or --prompt-input-tokens-stddev. Use the rankings-specific options for controlling token counts in rankings queries and passages.
Profile using Custom Inputs
Create a file named rankings.jsonl where each line represents a ranking request with a query and one or more passages.
Run AIPerf using the following command:
Section 2. Profile Cohere Re-Rank API
Start vLLM Server in Cohere Mode
Run vLLM with the --runner pooling flag to enable reranking behavior:
Profile using Synthetic Inputs
Run AIPerf using the following command:
Profile using Custom Inputs
Create a file named rankings.jsonl:
Run AIPerf: