nat.plugins.ragas.rag_evaluator.utils#

Functions#

nan_to_zero(→ float)

Convert NaN or None to 0.0 for safe arithmetic/serialization.

extract_metric_score(→ float | None)

Extract scalar score from a ragas metric result object.

build_metric_kwargs(→ dict[str, str | list[str]])

Build kwargs payload for metric.ascore(**kwargs) from a ragas sample.

score_metric_result(→ ragas.metrics.result.MetricResult)

Run one metric and return raw ragas MetricResult.

Module Contents#

nan_to_zero(v: float | None) float#

Convert NaN or None to 0.0 for safe arithmetic/serialization.

extract_metric_score(
metric_result: ragas.metrics.result.MetricResult,
) float | None#

Extract scalar score from a ragas metric result object.

build_metric_kwargs(sample: object) dict[str, str | list[str]]#

Build kwargs payload for metric.ascore(**kwargs) from a ragas sample.

async score_metric_result(
metric: ragas.metrics.base.SimpleBaseMetric,
sample: object,
) ragas.metrics.result.MetricResult#

Run one metric and return raw ragas MetricResult.

We first build a superset of possible sample fields, then filter kwargs by the concrete metric.ascore(...) signature so each metric only receives supported args.

Examples:

  • AnswerAccuracy(self, user_input, response, reference) forwards user_input, response, reference.

  • AnswerCorrectness(self, user_input, response, reference) forwards user_input, response, reference.

  • AnswerRelevancy(self, user_input, response) forwards user_input, response.

  • BleuScore(self, reference, response) forwards reference, response.

  • ResponseGroundedness(self, response, retrieved_contexts) forwards response, retrieved_contexts.