Retriever Evaluation Type#

Retriever evaluation types are designed to measure the effectiveness of document retrieval pipelines on standard academic datasets and custom datasets. Use this evaluation type to assess retrieval accuracy using metrics such as recall@k and NDCG@k.

Prerequisites#

Before running Retriever evaluations, ensure you have:

For custom datasets:

For all Retriever evaluations:

  • Access to embedding models for document indexing and query processing

  • Optional reranking service endpoints (for improved retrieval accuracy)

  • Properly configured retrieval pipeline components

Tip

For a complete dataset creation walkthrough, see the dataset management tutorials or follow the end-to-end evaluation example.


Authentication for External Services#

Retriever evaluations support API key authentication for external embedding and reranking services. This enables secure integration with third-party APIs like OpenAI, Cohere, and other providers.

Tip

For comprehensive authentication configuration examples and security best practices, refer to API Key Authentication.

Common Authentication Scenarios#

  • External query embedding models

  • External index embedding models

  • Third-party reranking services

Add the api_key field to any api_endpoint configuration:

{
  "api_endpoint": {
    "url": "https://api.cohere.ai/v1/rerank",
    "model_id": "rerank-english-v2.0",
    "api_key": "your-cohere-key"
  }
}

Options#

Embedding + Reranking (Standard Data)#

{
    "type": "retriever",
    "name": "retriever-standard",
    "namespace": "my-organization",
    "tasks": {
        "my-beir-task": {
            "type": "beir",
            "dataset": {
                "files_url": "file://fiqa/"
            },
            "metrics": {
                "recall_5": {"type": "recall_5"},
                "ndcg_cut_5": {"type": "ndcg_cut_5"},
                "recall_10": {"type": "recall_10"},
                "ndcg_cut_10": {"type": "ndcg_cut_10"}
            }
        }
    }
}
{
  "query": "What is the capital of France?",
  "retrieved_docs": [
    {"title": "France", "text": "Paris is the capital of France."},
    {"title": "Paris", "text": "Paris is a city in France."}
  ],
  "reference": "Paris"
}
{
  "groups": {
    "evaluation": {
      "metrics": {
        "evaluation": {
          "scores": {
            "recall_5": {"value": 1.0},
            "ndcg_cut_5": {"value": 0.9},
            "recall_10": {"value": 1.0},
            "ndcg_cut_10": {"value": 0.85}
          }
        }
      }
    }
  }
}

Embedding + Reranking (Custom Data)#

{
    "type": "retriever",
    "name": "retriever-custom",
    "namespace": "my-organization",
    "tasks": {
        "my-beir-task": {
            "type": "beir",
            "dataset": {
                "files_url": "hf://datasets/<my-dataset-namespace>/<my-dataset-name>"
            },
            "metrics": {
                "recall_5": {"type": "recall_5"},
                "ndcg_cut_5": {"type": "ndcg_cut_5"},
                "recall_10": {"type": "recall_10"},
                "ndcg_cut_10": {"type": "ndcg_cut_10"}
            }
        }
    }
}
{
  "query": "Who wrote Les Misérables?",
  "retrieved_docs": [
    {"title": "Les Misérables", "text": "Victor Hugo wrote Les Misérables."},
    {"title": "Victor Hugo", "text": "Victor Hugo was a French writer."}
  ],
  "reference": "Victor Hugo"
}
{
  "groups": {
    "evaluation": {
      "metrics": {
        "evaluation": {
          "scores": {
            "recall_5": {"value": 1.0},
            "ndcg_cut_5": {"value": 0.95},
            "recall_10": {"value": 1.0},
            "ndcg_cut_10": {"value": 0.9}
          }
        }
      }
    }
  }
}

Metrics#

Supported Retriever Metrics#

Metric

Description

Value Range

Example

recall_k

Fraction of relevant documents retrieved in the top k results

0.0 – 1.0

recall_5, recall_10

ndcg_k

Normalized Discounted Cumulative Gain at rank k (ranking quality up to k)

0.0 – 1.0

ndcg_5, ndcg_10

ndcg_cut_k

NDCG at rank k (cutoff variant, often equivalent to ndcg_k)

0.0 – 1.0

ndcg_cut_5, ndcg_cut_10

Custom Dataset Format#

Refer to RAG’s Custom Dataset Format documentation.