nat.plugins.ragas.rag_evaluator.utils#

Functions#

`nan_to_zero`(→ float)	Convert NaN or None to 0.0 for safe arithmetic/serialization.
`extract_metric_score`(→ float \| None)	Extract scalar score from a ragas metric result object.
`build_metric_kwargs`(→ dict[str, str \| list[str]])	Build kwargs payload for `metric.ascore(**kwargs)` from a ragas sample.
`score_metric_result`(→ ragas.metrics.result.MetricResult)	Run one metric and return raw ragas `MetricResult`.

Module Contents#

nan_to_zero(v: float | None) → float#: Convert NaN or None to 0.0 for safe arithmetic/serialization.

extract_metric_score( metric_result: ragas.metrics.result.MetricResult, ) → float | None#: Extract scalar score from a ragas metric result object.

build_metric_kwargs(sample: object) → dict[str, str | list[str]]#: Build kwargs payload for metric.ascore(**kwargs) from a ragas sample.

async score_metric_result( metric: ragas.metrics.base.SimpleBaseMetric, sample: object, ) → ragas.metrics.result.MetricResult#

Run one metric and return raw ragas MetricResult.

We first build a superset of possible sample fields, then filter kwargs by the concrete metric.ascore(...) signature so each metric only receives supported args.

Examples:

AnswerAccuracy(self, user_input, response, reference) forwards user_input, response, reference.
AnswerCorrectness(self, user_input, response, reference) forwards user_input, response, reference.
AnswerRelevancy(self, user_input, response) forwards user_input, response.
BleuScore(self, reference, response) forwards reference, response.
ResponseGroundedness(self, response, retrieved_contexts) forwards response, retrieved_contexts.