Model Evaluation How-To Guides#
This section provides task-focused procedures for running eval/model_eval.
For your first run, start with Getting Started With Model Evaluation.
For agent-driven sessions, read Use The Model Evaluation Skill With Confidence first.
Choose A Guide#
Discover The Step
List the step, read its contract, and decide whether it applies to the task.
Run A Hosted Evaluation
Run benchmarks against an already-running, OpenAI-compatible endpoint.
Evaluate A Deployed Checkpoint
Choose a deployment path, deploy the endpoint, and point the step at it.