AIPerf can be driven entirely from a single YAML file instead of a long string of CLI flags. The YAML format is more readable, easier to version-control, and unlocks features that have no CLI equivalent — sweeps, multi-run aggregation, environment variable substitution, and computed values.
This tutorial walks through what a config file looks like, how to grow it from a tiny example to a full sweep, and how it compares to running everything through aiperf profile flags.
You don’t need to choose between the two: CLI flags still work, and they layer on top of a YAML file when you pass both.
A typical concurrency sweep on the command line looks like this:
The same run as a YAML file:
Run it with:
Note the two 500s map to different things. The CLI’s --request-count 500 is the stop condition — keep firing until 500 requests complete — and corresponds to phases.profiling.requests: 500. The dataset.entries: 500 is the dataset size — how many unique synthetic prompts to generate up front — and has no CLI shorthand; it’s recycled across requests if the phase runs longer than the dataset. They happen to share a value here but tune independently.
What you gain over the flag form:
${VAR:default}) and compute values with simple expressions ({{ var * 2 }}).The smallest legal config is short:
Then:
That’s a complete benchmark — model, endpoint, dataset, and one profiling phase. The endpoint path (/v1/chat/completions) is auto-detected from endpoint.type (defaulting to chat).
You can scaffold this exact file from the bundle without typing it:
aiperf config init --list prints every bundled template, grouped by category.
A YAML config has two layers:
The envelope holds settings that apply across runs — sweep definitions, multi-run aggregation, the random seed, and reusable variables.
The benchmark: body holds everything that defines a single benchmark workload. When a sweep is active, this body is what gets varied across runs.
Short configs use singular keys. Bigger configs use plural lists with names:
You can mix and match — the loader auto-expands model: into a one-element models: list, dataset: into a one-entry datasets: list named default, and a flat phases: block into a one-element list named profiling. The normalized datasets: form is future-facing but currently accepts exactly one dataset; multiple datasets are a roadmap item.
Instead of pointing at a prompts.jsonl file with dataset.path:, you can embed records directly in the YAML:
Useful for shareable repros, k8s ConfigMaps, and small regression fixtures. See Inline Datasets for full coverage including multi-turn, random_pool (with multi-pool dict-of-lists), and mooncake_trace examples.
AIPerf accepts either snake_case or camelCase for any field. These two are equivalent:
Pick one and stick with it within a file.
A bundled JSON Schema gives you autocomplete, type-checking, and inline docs in any editor that speaks YAML language server (VS Code, JetBrains, Vim/Neovim with coc-yaml, Helix, etc.). The schema lives at src/aiperf/config/schema/aiperf-config.schema.json in the AIPerf repo. Copy or symlink it next to your config and point your editor at it with a relative path:
Now the editor will:
concurrency: "eight" instead of 8).If your editor already has a workspace mapping for **/aiperf-config.yaml or **/benchmark.yaml, you can skip the header. See src/aiperf/config/schema/README.md for VS Code workspace and IntelliJ configuration examples.
Top-level envelope keys reject unknown names with a “did you mean” hint. Writing sweeps: instead of sweep: produces:
Inside the benchmark: body and inside sweep parameter paths, every section is set to reject unknown fields outright. A typo’d sweep parameter like phases.profiling.concurency (one r) is caught at validate time — aiperf config validate runs the same sweep-expansion pipeline profile does and surfaces the error before any compute is spent:
Use aiperf config validate <file> for routine linting. Use aiperf config expand <file> when you want to preview the actual variations a sweep will produce (see below). Both catch sweep-path typos; expand additionally renders the variation list.
Use ${VAR} for required values and ${VAR:default} for optional ones:
Run it across deployments without editing the file:
Strings are auto-coerced to the right type — TIMEOUT=600.0 becomes a float, STREAMING=true becomes a bool.
If a required ${VAR} is unset, you get a clean error naming the variable, not a silent fallback.
Define values once at the top, reference them anywhere with {{ }} Jinja expressions:
A few things worth knowing:
{{ base_concurrancy }} raise an error immediately — they don’t silently render as an empty string."42" and "3.14" are coerced to int/float automatically, so you don’t have to remember which fields expect numbers.entries: "{{ base * '${MULT:10}' | int }}".A typical benchmark is a quick warmup followed by the real measurement. CLI warmup flags are limited to scalar values per phase shape (--warmup-request-count, --warmup-duration, --warmup-concurrency, --warmup-request-rate, --warmup-arrival-pattern, and a handful of ramp/grace-period siblings). YAML lets you describe warmup as a full phase with all the same fields available to profiling:
Each phase is a complete arrival pattern in its own right, with its own concurrency, duration, and arrival shape (concurrency, constant, poisson, gamma, fixed_schedule, user_centric, …).
Sweeps are the killer feature of YAML configs. The CLI only ever supported list-style flags like --concurrency 8,16,32. YAML lets you sweep any field, combine multiple parameters, or pull from a quasi-random distribution.
Here’s a 3 × 3 = 9-run grid sweep over input length and request rate:
The parameters: keys are dot-paths into the benchmark: body. For lists, the second segment is the entry’s name:
phases.profiling.rate → the phase named profiling, field ratedatasets.default.prompts.isl → the dataset named default (the singular dataset: shorthand auto-names it default)The 12 most-swept phase fields also have bare-name sugar: concurrency, prefill_concurrency, rate, requests, duration, sessions, users, smoothness, grace_period, concurrency_ramp, prefill_ramp, rate_ramp. Each expands to phases.profiling.<name> (resolves to the unique non-warmup phase). The two forms are equivalent — see Bare-Name Aliases.
Other sweep modes available in YAML:
zip — pair parameters lockstep instead of cross-product (useful for paired ISL/OSL).scenarios — hand-curated named workload profiles, each a deep-merge over the base body.sobol / latin_hypercube — quasi-random space-filling samples.adaptive_search — Bayesian optimization over multiple objectives.For a guided picker, see Parameter Sweeps — Choosing a sweep mode.
You can preview what a sweep will run before spending any compute:
Running the same benchmark several times and taking the mean ± confidence interval is a separate envelope-level setting:
multi_run and sweep compose: a 9-variation grid × 3 runs = 27 benchmarks, with confidence intervals computed per variation. See Multi-Run Confidence Reporting for what the report looks like.
Three commands cover the common authoring tasks:
validate runs the same load pipeline profile does, so anything wrong shows up here — typos, missing required fields, sweep paths that don’t resolve, env vars that aren’t set.
YAML configs and CLI flags are not either/or. Flags overlay whatever’s in the file:
This loads benchmark.yaml as the base, then overrides the profiling phase’s concurrency with 32 and the artifact directory with the new path. (CLI loadgen flags overlay onto the phase named profiling — they don’t broadcast to every named phase, so multi-phase configs need YAML edits to tweak warmup or other phases.) Useful when most of your config is stable but you want to tweak one knob from a script or CI job.
The precedence order, lowest to highest:
Getting Started, Load Testing, Datasets, Sweep & Multi-Run, Advanced, Multimodal, Specialized Endpoints).multi_run propagates through reports.