AIPerf supports benchmarking ranking and reranking models, including those served through Hugging Face Text Embeddings Inference (TEI) or Cohere Re-Rank APIs. These models take a query and one or more passages, returning a similarity or relevance score.
Launch a Hugging Face Text Embeddings Inference (TEI) container in re-ranker mode:
Run AIPerf using the following command:
Sample Output (Successful Run):
The rankings-specific token options cannot be used together with --prompt-input-tokens-mean or --prompt-input-tokens-stddev. Use the rankings-specific options for controlling token counts in rankings queries and passages.
Create a file named rankings.jsonl where each line represents a ranking request with a query and one or more passages.
Run AIPerf using the following command:
Run vLLM with the --runner pooling flag to enable reranking behavior:
Run AIPerf using the following command:
Create a file named rankings.jsonl:
Run AIPerf: