Concepts#

The pages in this section cover the design rules behind eval/model_eval. Read them when you want to understand why the how-to pages take the actions they do, before you change a configuration default or adapt a recipe to a new deployment.

Pipeline And Architecture#

Pipeline Overview

Artifact flow from a checkpoint or hosted endpoint, through eval/model_eval, into eval_results on disk.

Pipeline Overview

Deployment Contract#

Endpoint Types And Benchmark Families

Chat versus completions endpoints, and which benchmark families match each one.

Endpoint Types And Task Families
Tokenizer Alignment

Why log-probability benchmarks need a tokenizer that matches the served model.

Tokenizer Alignment