For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Welcome to AIPerf Documentation
  • Getting Started
    • Profiling with AIPerf
    • Comprehensive LLM Benchmarking
    • Migrating from GenAI-Perf
    • GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
  • Tutorials
      • Profile OpenAI-Compatible Text APIs Using AIPerf
      • Profile the OpenAI Responses API with AIPerf
      • Profile Hugging Face TGI Models with AIPerf
      • Profile Vision Language Models with AIPerf
      • Profile Audio Language Models with AIPerf
      • Profile ASR Models with Public Datasets
      • Profile Embedding Models with AIPerf
      • Profile Ranking Models with AIPerf
      • Profile NIM Image Retrieval with AIPerf
      • SGLang Image Generation
      • SGLang Image Edit
      • SGLang Video Generation
      • Synthetic Video Generation
      • Template Endpoint
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Section 1. Profile Hugging Face TEI Re-Rank Models
  • Start a Hugging Face TEI Server
  • Profile using Synthetic Inputs
  • Profile using Custom Inputs
  • Section 2. Profile Cohere Re-Rank API
  • Start vLLM Server in Cohere Mode
  • Profile using Synthetic Inputs
  • Profile using Custom Inputs
TutorialsModel & Endpoint Guides

Profile Ranking Models with AIPerf

||View as Markdown|
Previous

Profile Embedding Models with AIPerf

Next

Profile NIM Image Retrieval with AIPerf

AIPerf supports benchmarking ranking and reranking models, including those served through Hugging Face Text Embeddings Inference (TEI) or Cohere Re-Rank APIs. These models take a query and one or more passages, returning a similarity or relevance score.


Section 1. Profile Hugging Face TEI Re-Rank Models

Start a Hugging Face TEI Server

Launch a Hugging Face Text Embeddings Inference (TEI) container in re-ranker mode:

$docker run --gpus all --rm -it \
> -p 8080:80 \
> -e MODEL_ID=BAAI/bge-reranker-base \
> ghcr.io/huggingface/text-embeddings-inference:latest \
> --model-id BAAI/bge-reranker-base --port 80
$# Verify server is running
$curl -s http://localhost:8080/rerank \
> -H "Content-Type: application/json" \
> -d '{"query":"What is AI?", "texts":["AI is artificial intelligence.","Bananas are yellow."]}' | jq

Profile using Synthetic Inputs

Run AIPerf using the following command:

$aiperf profile \
> -m BAAI/bge-reranker-base \
> --endpoint-type hf_tei_rankings \
> --url localhost:8080 \
> --request-count 10 \
> --rankings-passages-mean 5 \
> --rankings-passages-stddev 1 \
> --rankings-passages-prompt-token-mean 32 \
> --rankings-passages-prompt-token-stddev 8 \
> --rankings-query-prompt-token-mean 16 \
> --rankings-query-prompt-token-stddev 4

Sample Output (Successful Run):

INFO Starting AIPerf System
INFO AIPerf System is PROFILING
Profiling: 10/10 |████████████████████████| 100% [00:02<00:00]
INFO Benchmark completed successfully
INFO Results saved to: artifacts/BAAI_bge-reranker-base-rankings/
NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p50 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ Request Latency (ms) │ 52.34 │ 45.12 │ 68.45 │ 65.23 │ 51.89 │
│ Request Throughput (req/s) │ 5.12 │ - │ - │ - │ - │
└────────────────────────────┴───────┴───────┴───────┴───────┴───────┘
JSON Export: artifacts/BAAI_bge-reranker-base-rankings/profile_export_aiperf.json

The rankings-specific token options cannot be used together with --prompt-input-tokens-mean or --prompt-input-tokens-stddev. Use the rankings-specific options for controlling token counts in rankings queries and passages.

Profile using Custom Inputs

Create a file named rankings.jsonl where each line represents a ranking request with a query and one or more passages.

$cat <<EOF > rankings.jsonl
${"texts":[{"name":"query","contents":["What is AI topic 0?"]},{"name":"passages","contents":["AI passage 0"]}]}
${"texts":[{"name":"query","contents":["What is AI topic 1?"]},{"name":"passages","contents":["AI passage 1"]}]}
${"texts":[{"name":"query","contents":["What is AI topic 2?"]},{"name":"passages","contents":["AI passage 2"]}]}
${"texts":[{"name":"query","contents":["What is AI topic 3?"]},{"name":"passages","contents":["AI passage 3"]}]}
${"texts":[{"name":"query","contents":["What is AI topic 4?"]},{"name":"passages","contents":["AI passage 4"]}]}
$EOF

Run AIPerf using the following command:

$aiperf profile \
> -m BAAI/bge-reranker-base \
> --endpoint-type hf_tei_rankings \
> --url localhost:8080 \
> --input-file ./rankings.jsonl \
> --custom-dataset-type single_turn \
> --request-count 10

Section 2. Profile Cohere Re-Rank API

Start vLLM Server in Cohere Mode

Run vLLM with the --runner pooling flag to enable reranking behavior:

$docker run --gpus all -p 8080:8000 \
> -e HF_TOKEN=<HF_TOKEN> \
> vllm/vllm-openai:latest \
> --model BAAI/bge-reranker-v2-m3 \
> --runner pooling
$# Verify the server
$curl -s http://localhost:8080/v1/rerank \
> -H "Content-Type: application/json" \
> -d '{"query":"What is AI?","documents":["Artificial intelligence overview","Bananas are yellow"]}' | jq

Profile using Synthetic Inputs

Run AIPerf using the following command:

$aiperf profile \
> -m BAAI/bge-reranker-v2-m3 \
> --endpoint-type cohere_rankings \
> --url localhost:8080 \
> --request-count 10

Profile using Custom Inputs

Create a file named rankings.jsonl:

$cat <<EOF > rankings.jsonl
${"texts":[{"name":"query","contents":["What is AI topic 0?"]},{"name":"passages","contents":["AI passage 0"]}]}
${"texts":[{"name":"query","contents":["What is AI topic 1?"]},{"name":"passages","contents":["AI passage 1"]}]}
${"texts":[{"name":"query","contents":["What is AI topic 2?"]},{"name":"passages","contents":["AI passage 2"]}]}
${"texts":[{"name":"query","contents":["What is AI topic 3?"]},{"name":"passages","contents":["AI passage 3"]}]}
${"texts":[{"name":"query","contents":["What is AI topic 4?"]},{"name":"passages","contents":["AI passage 4"]}]}
$EOF

Run AIPerf:

$aiperf profile \
> -m BAAI/bge-reranker-v2-m3 \
> --endpoint-type cohere_rankings \
> --url localhost:8080 \
> --input-file ./rankings.jsonl \
> --custom-dataset-type single_turn \
> --request-count 10