nat.plugins.eval.runtime#
Submodules#
Classes#
Run ATIF-native evaluators against a shared sample list. |
Package Contents#
- class EvaluationHarness(logger_instance: logging.Logger | None = None)#
Run ATIF-native evaluators against a shared sample list.
- _logger#
- async _evaluate_single(
- evaluator_name: str,
- evaluator: nat.plugins.eval.evaluator.atif_evaluator.AtifEvaluator,
- atif_samples: nat.plugins.eval.evaluator.atif_evaluator.AtifEvalSampleList,
Evaluate one evaluator using the ATIF lane.
- Returns:
A tuple of evaluator name and result on success, otherwise
None.
- async evaluate(
- evaluators: dict[str, nat.plugins.eval.evaluator.atif_evaluator.AtifEvaluator],
- atif_samples: nat.plugins.eval.evaluator.atif_evaluator.AtifEvalSampleList,
Evaluate ATIF-native evaluators concurrently.
- Args:
evaluators: Evaluators keyed by evaluator name. atif_samples: Pre-built ATIF samples shared by all evaluators.
- Returns:
A mapping of evaluator name to
EvalOutputfor successful evaluators.