nat.plugins.langchain.eval.trajectory_evaluator#
Attributes#
Classes#
Agent trajectory evaluator configuration. |
|
Base class for custom evaluators. |
Functions#
|
Best-effort coercion to text for judge-chain inputs. |
|
Best-effort extraction of numeric judge score from parser failures. |
|
Convert intermediate steps to LangChain |
|
Convert ATIF message payloads into text for LangChain trajectory scoring. |
|
Return whether a value is non-empty for trajectory scoring. |
Drop adjacent duplicate trajectory rows to reduce evaluator noise. |
|
Convert an ATIF trajectory into LangChain |
|
|
Extract first user message from ATIF trajectory. |
|
Module Contents#
- logger#
- _DEFAULT_EVENT_FILTER#
- _extract_score_from_parser_error(error_text: str) float | None#
Best-effort extraction of numeric judge score from parser failures.
- class TrajectoryEvaluatorConfig#
Bases:
nat.data_models.evaluator.EvaluatorLLMConfigAgent trajectory evaluator configuration.
- _to_agent_actions(
- intermediate_steps: list[nat.data_models.intermediate_step.IntermediateStep],
Convert intermediate steps to LangChain
agent_trajectorytuples.
- _message_to_text(message) str#
Convert ATIF message payloads into text for LangChain trajectory scoring.
- _dedupe_adjacent_actions( ) list[tuple[langchain_core.agents.AgentAction, str]]#
Drop adjacent duplicate trajectory rows to reduce evaluator noise.
- _atif_to_agent_actions(
- trajectory,
Convert an ATIF trajectory into LangChain
agent_trajectorytuples.Action mapping is intentionally step-centric: - Emit at most one LLM action for each agent step when the step message is meaningful. - Emit one tool action for each structurally valid tool call in that step. - Skip structurally empty artifacts and adjacent duplicate rows to reduce evaluator noise.
- class TrajectoryEvaluator(
- llm: langchain_core.language_models.BaseChatModel,
- tools: list[langchain_core.tools.BaseTool] | None = None,
- max_concurrency: int = 8,
Bases:
nat.plugins.eval.evaluator.base_evaluator.BaseEvaluatorBase class for custom evaluators.
Warning
Experimental Feature: The Evaluation API is experimental and may change in future releases. Future versions may introduce breaking changes without notice.
Each custom evaluator must implement the
evaluate_itemmethod which is used to evaluate a single EvalInputItem.- traj_eval_chain#
- async _evaluate_with_trajectory(
- item_id,
- lane: str,
- question: str,
- generated_answer: str,
- agent_trajectory: list[tuple[langchain_core.agents.AgentAction, str]],
Run trajectory scoring for one item regardless of input lane.
- async evaluate_item( ) nat.plugins.eval.data_models.evaluator_io.EvalOutputItem#
Each evaluator must implement this for item-level evaluation
- async evaluate_atif_item( ) nat.plugins.eval.data_models.evaluator_io.EvalOutputItem#
Evaluate a single ATIF-native sample.
- async evaluate_atif_fn(
- atif_samples: nat.plugins.eval.evaluator.atif_evaluator.AtifEvalSampleList,
ATIF-native evaluation lane for trajectory scoring.
- async register_trajectory_evaluator(
- config: TrajectoryEvaluatorConfig,
- builder: nat.builder.builder.EvalBuilder,