nat.plugins.eval.eval_callbacks#
Attributes#
Classes#
Per-dataset-item result from evaluation. |
|
Full result of a single evaluation run. |
|
Base class for protocol classes. |
|
Dispatches eval lifecycle callbacks to registered integrations. |
Functions#
|
Build an EvalResult from raw evaluation data. |
Module Contents#
- logger#
- class EvalResultItem#
Per-dataset-item result from evaluation.
- item_id: Any#
- input_obj: Any#
- expected_output: Any#
- actual_output: Any#
- class EvalResult#
Full result of a single evaluation run.
The
metric_scoresanditemsfields are always populated. The remaining fields are optional context that exporters (e.g.FileEvalCallback) can use to persist richer output without breaking callbacks that only inspect scores.- items: list[EvalResultItem]#
- output_dir: pathlib.Path | None = None#
- build_eval_result(
- *,
- eval_input_items: list,
- evaluation_results: list[tuple[str, Any]],
- metric_scores: dict[str, float],
- usage_stats: Any | None = None,
- item_span_ids: dict[str, int] | None = None,
- workflow_output_json: str | None = None,
- atif_workflow_output_json: str | None = None,
- run_config: Any | None = None,
- effective_config: Any | None = None,
- output_dir: pathlib.Path | None = None,
Build an EvalResult from raw evaluation data.
This is the single place that maps eval-input items + evaluator outputs into the callback-friendly
EvalResult/EvalResultItemstructure.
- class EvalCallback#
Bases:
ProtocolBase class for protocol classes.
Protocol classes are defined as:
class Proto(Protocol): def meth(self) -> int: ...
Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing).
For example:
class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check
See PEP 544 for details. Protocol classes decorated with @typing.runtime_checkable act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, they are defined as:
class GenProto[T](Protocol): def meth(self) -> T: ...
- on_eval_complete(result: EvalResult) None#
- evaluation_context()#
- class EvalCallbackManager#
Dispatches eval lifecycle callbacks to registered integrations.
Maintainer note: Keep this callback surface stable for provider plugins. If we later adopt an internal event-subscriber bus (typed events, async fan-out, retries), it can be introduced behind this manager as a near-term design evolution.
- _callbacks: list[EvalCallback] = []#
- register(callback: EvalCallback) None#
- property needs_root_span_ids: bool#
Check if any registered callback declares it needs pre-generated root span_ids.
- evaluation_context()#
- on_eval_complete(result: EvalResult) None#