nat.eval.eval_callbacks#
Attributes#
Classes#
Per-dataset-item result from evaluation. |
|
Full result of a single evaluation run. |
|
Base class for protocol classes. |
|
Functions#
|
Build an EvalResult from raw evaluation data. |
Module Contents#
- logger#
- class EvalResultItem#
Per-dataset-item result from evaluation.
- item_id: Any#
- input_obj: Any#
- expected_output: Any#
- actual_output: Any#
- class EvalResult#
Full result of a single evaluation run.
- items: list[EvalResultItem]#
- build_eval_result(
- *,
- eval_input_items: list,
- evaluation_results: list[tuple[str, Any]],
- metric_scores: dict[str, float],
- usage_stats: Any | None = None,
- item_span_ids: dict[str, int] | None = None,
Build an EvalResult from raw evaluation data.
This is the single place that maps eval-input items + evaluator outputs into the callback-friendly
EvalResult/EvalResultItemstructure.
- class EvalCallback#
Bases:
ProtocolBase class for protocol classes.
Protocol classes are defined as:
class Proto(Protocol): def meth(self) -> int: ...
Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing).
For example:
class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check
See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:
class GenProto[T](Protocol): def meth(self) -> T: ...
- on_eval_complete(result: EvalResult) None#
- class EvalCallbackManager#
- _callbacks: list[EvalCallback] = []#
- register(callback: EvalCallback) None#
- property needs_root_span_ids: bool#
Check if any registered callback declares it needs pre-generated root span_ids.
- on_eval_complete(result: EvalResult) None#