This document is forward-looking. The shapes, field names, and behaviors described below are not all wired end-to-end yet. Some sections describe seams that exist in the code but are not reachable from a config file; others describe features that are still at the design stage. Do not treat any YAML in this document as a working example unless it appears in YAML Configuration Files. Field names may change before they ship.
This document describes planned extensions to the YAML configuration format. It exists so that contributors and power users can see where the format is headed, why the seams in the current loader were placed where they were, and which workloads will become expressible once the missing pieces land.
For the format as it works today, see YAML Configuration Files. For the schema, see src/aiperf/config/schema/aiperf-config.schema.json.
The v2 envelope is partway between single-config and the multi-phase / multi-dataset shape this document targets. The seams are intentional, but several stop short of being usable end-to-end.
What works today:
benchmark.models is a ModelsAdvanced block (src/aiperf/config/models.py:113) with items: list[ModelItem] and a strategy field — round_robin, random, or weighted. modality_aware is roadmap-only and is not accepted by the current validator. The singular model: shorthand is normalized into the items list (src/aiperf/config/loader/normalizers.py:79-89). Multi-model in one run is a real feature, not a roadmap item.benchmark.phases: [...] is a list, validated as a discriminated union over phase types. The singular phases: { type: ..., ... } shorthand is normalized to a one-entry list named profiling (src/aiperf/config/loader/normalizers.py:99-103). Top-level warmup: / profiling: shorthand is normalized to a [warmup, profiling] list.dataset: is auto-promoted to a one-entry list with name: "default" (src/aiperf/config/loader/normalizers.py:92-97).src/aiperf/config/sweep/expand.py; see the phases.profiling.<X> special case at expand.py:472-477.What does not yet hold end-to-end:
BasePhaseConfig.name is typed as Literal["warmup", "profiling"] (src/aiperf/config/phases.py:71-80). Multiple phases of the same kind are allowed, but they must reuse one of those two canonical names. Truly user-named phases are not plumbed through credit issuance, the timing manager, the records pipeline, or the report layout.benchmark.datasets is hard-capped at one entry. The field is list[DatasetConfig] with min_length=1, max_length=1 (src/aiperf/config/config.py:166-177). The list shape exists only so the same schema can be shared between YAML and the AIPerfSweep CRD; the field’s own description states “the runtime currently loads exactly one dataset.” Multiple-dataset input is rejected at validation time, not at runtime.TimingResolver._validate_fixed_schedule_timing reads a per-phase dataset via getattr(phase, "dataset", None) or run.cfg.get_default_dataset_name() (src/aiperf/config/resolution/resolvers.py:353-355), but no dataset: field exists on BasePhaseConfig yet, so the lookup always falls through to the default. The seam is anticipating a feature that hasn’t landed.check_phase_dataset_compatibility (src/aiperf/config/resolution/predicates.py:201-243) currently rejects only two combinations: a phase that requires_sequential_sampling (today, just fixed_schedule) against a file dataset that doesn’t use sequential sampling, and a phase that requires_multi_turn (today, just user_centric) against a non-multi-turn file dataset. Other compatibility axes — synthetic-vs-trace for fixed_schedule, dataset format mismatches — are not yet enforced here.The roadmap items below describe how each of those gaps closes.
Two phases (one warmup, one profiling) covers most synthetic load tests. It runs out of expressivity quickly:
All of these are expressible in YAML today only by collapsing distinct logical phases under the same name (profiling, profiling, profiling) and disambiguating later by index, which loses the clarity the named-phase shape was meant to give.
Key changes:
name becomes free-form (validated against a permissive identifier regex), rather than a Literal.kind field carries the warmup-vs-profiling distinction the credit pipeline currently derives from the name. exclude_from_results is then driven by kind, not by string equality on name.phases.steady_state_profile.rate).name: warmup defaults kind: warmup, name: profiling defaults kind: profiling.End-to-end naming touches roughly five layers:
src/aiperf/config/phases.py — BasePhaseConfig.name: str, new kind: Literal["warmup", "profiling"] field with name-based defaults.PhaseRunner and CreditIssuer) — index phases by name rather than by is_warmup boolean.src/aiperf/config/sweep/expand.py) — already addresses phases by name; minor changes needed only if the keying logic assumes the two-element set.datasets: is a one-element list today: the field declares min_length=1, max_length=1 so the schema can be shared with the AIPerfSweep CRD without forking. Lifting the cap is the prerequisite for every workload below.
max_length=1 cap on BenchmarkConfig.datasets in src/aiperf/config/config.py:166-177, replacing the schema-share comment with a real multi-dataset contract.dataset: <name> to BasePhaseConfig so the partial scaffolding at src/aiperf/config/resolution/resolvers.py:353-355 becomes a real read instead of always falling through to get_default_dataset_name().phase.dataset resolves to an entry in benchmark.datasets. Use the existing “did you mean?” hinting infrastructure for typos.check_phase_dataset_compatibility (src/aiperf/config/resolution/predicates.py:201-243). Today it only checks requires_sequential_sampling (file-dataset sampling strategy) and requires_multi_turn (file-dataset format). Add: synthetic-vs-trace mismatches for fixed_schedule, dataset-format compatibility per phase type, and any rules that fall out of multi-dataset semantics. The fixed-schedule timing-data check in TimingResolver._validate_fixed_schedule_timing (src/aiperf/config/resolution/resolvers.py:347-362) can move here once it has a real phase.dataset to read.The user_centric and fixed_schedule constraints are partially enforced today: requires_multi_turn(USER_CENTRIC) and requires_sequential_sampling(FIXED_SCHEDULE) are checked against file datasets in check_phase_dataset_compatibility. The synthetic-vs-fixed_schedule rejection and the timing-data check (currently in TimingResolver._validate_fixed_schedule_timing) move into the same checker as part of this work.
Multi-model in one run is already supported via ModelsAdvanced.strategy (round_robin, random, weighted) — a single phase can route across the full items list. modality_aware remains roadmap-only. What is not supported is binding a specific model to a specific phase, which lets you compare two models within one job under matched arrival patterns:
phases[].model would be a name reference into models.items, narrowing the selection strategy to a single fixed pick for the duration of the phase. This stays compatible with the project’s no-aggregate-across-runs rule: each phase’s results are reported independently, and the report makes the model name part of the phase header.
Most users will not need this, but it falls out cleanly once datasets and models are per-phase: a phase that targets a different deployment (different URL, different endpoint.type) can be expressed without a separate job. Useful for side-by-side gateway-vs-direct comparisons or for benchmarking a fallback path. Likely gated behind explicit opt-in to discourage accidental misconfiguration.
The current model assumes a strict linear ordering of phases[]. Several enhancements compose:
These are deliberately listed as separate items: each is independently useful, and we should not bundle them into a single “phases v3” change.
Once configs grow to four or five phases, repetition becomes the readability problem. Two complementary mechanisms:
templates: block under the envelope — define a named partial config; reference it from a phase or dataset entry with extends: <name>. Resolution happens before sweep expansion so sweep parameter paths still address concrete phases.Items deliberately not on this roadmap:
{{ }} is intentionally restricted; arbitrary Python is not coming back.