Verification Patterns

View as Markdown

Verification is how an environment evaluates agent behavior and computes a score. Every environment implements some form of verification — the pattern you choose depends on your task.

Equivalence Match

Compare the agent’s output to a known reference answer.

coming soon
Execution and State Match

Execute the agent’s actions, such as tool calls or generated code, and verify the output or resulting state.

coming soon
LLM-as-Judge

Prompt an LLM to evaluate the agent’s output against rubrics, instructions, or reference answers.

Reward Model

Use an LLM trained on human preferences to score outputs for alignment, such as RLHF reward modeling.

coming soon