Concepts#
The pages in this section cover the design rules behind eval/model_eval.
Read them when you want to understand why the how-to pages take the actions they do, before you change a configuration default or adapt a recipe to a new deployment.
Pipeline And Architecture#
Pipeline Overview
Artifact flow from a checkpoint or hosted endpoint, through eval/model_eval, into eval_results on disk.
Deployment Contract#
Endpoint Types And Benchmark Families
Chat versus completions endpoints, and which benchmark families match each one.
Tokenizer Alignment
Why log-probability benchmarks need a tokenizer that matches the served model.