Retriever Evaluation Type#

Retriever evaluation types are designed to measure the effectiveness of document retrieval pipelines on standard academic datasets and custom datasets. Use this evaluation type to assess retrieval accuracy using metrics such as recall@k and NDCG@k.

Options#

Embedding + Reranking (Standard Data)#

{
    "type": "retriever",
    "name": "retriever-standard",
    "namespace": "my-organization",
    "tasks": {
        "my-beir-task": {
            "type": "beir",
            "dataset": {
                "files_url": "file://fiqa/"
            },
            "metrics": {
                "recall_5": {"type": "recall_5"},
                "ndcg_cut_5": {"type": "ndcg_cut_5"},
                "recall_10": {"type": "recall_10"},
                "ndcg_cut_10": {"type": "ndcg_cut_10"}
            }
        }
    }
}
{
  "query": "What is the capital of France?",
  "retrieved_docs": [
    {"title": "France", "text": "Paris is the capital of France."},
    {"title": "Paris", "text": "Paris is a city in France."}
  ],
  "reference": "Paris"
}
{
  "groups": {
    "evaluation": {
      "metrics": {
        "evaluation": {
          "scores": {
            "recall_5": {"value": 1.0},
            "ndcg_cut_5": {"value": 0.9},
            "recall_10": {"value": 1.0},
            "ndcg_cut_10": {"value": 0.85}
          }
        }
      }
    }
  }
}

Embedding + Reranking (Custom Data)#

{
    "type": "retriever",
    "name": "retriever-custom",
    "namespace": "my-organization",
    "tasks": {
        "my-beir-task": {
            "type": "beir",
            "dataset": {
                "files_url": "hf://datasets/<my-dataset-namespace>/<my-dataset-name>"
            },
            "metrics": {
                "recall_5": {"type": "recall_5"},
                "ndcg_cut_5": {"type": "ndcg_cut_5"},
                "recall_10": {"type": "recall_10"},
                "ndcg_cut_10": {"type": "ndcg_cut_10"}
            }
        }
    }
}
{
  "query": "Who wrote Les Misérables?",
  "retrieved_docs": [
    {"title": "Les Misérables", "text": "Victor Hugo wrote Les Misérables."},
    {"title": "Victor Hugo", "text": "Victor Hugo was a French writer."}
  ],
  "reference": "Victor Hugo"
}
{
  "groups": {
    "evaluation": {
      "metrics": {
        "evaluation": {
          "scores": {
            "recall_5": {"value": 1.0},
            "ndcg_cut_5": {"value": 0.95},
            "recall_10": {"value": 1.0},
            "ndcg_cut_10": {"value": 0.9}
          }
        }
      }
    }
  }
}

Metrics#

Supported Retriever Metrics#

Metric

Description

Value Range

Example

recall_k

Fraction of relevant documents retrieved in the top k results

0.0 – 1.0

recall_5, recall_10

ndcg_k

Normalized Discounted Cumulative Gain at rank k (ranking quality up to k)

0.0 – 1.0

ndcg_5, ndcg_10

ndcg_cut_k

NDCG at rank k (cutoff variant, often equivalent to ndcg_k)

0.0 – 1.0

ndcg_cut_5, ndcg_cut_10

Custom Dataset Format#

Refer to RAG’s Custom Dataset Format documentation.