Sweep + Orchestrator Developer Reference
Sweep + Orchestrator Developer Reference
Developer reference for AIPerf’s sweep + adaptive-search machinery. Three zoom levels below: mental model -> seven-stage tour -> class/module map.
Part 1 — Mental model
The big idea
Every AIPerf run — single benchmark, parameter grid, or Bayesian search — is the same pipeline with different cardinalities:
One pipeline, every scale — by design. A single benchmark, a local multi-run for confidence intervals, a grid or scenarios sweep, a Sobol or Latin-Hypercube characterization, a local Bayesian-Optimization adaptive search, and a cluster-side BO running across hundreds of pods are not seven different code paths. They are seven cardinalities of one pipeline:
BenchmarkPlandescribes what to run,MultiRunOrchestratordecides when and in what order, aSearchPlanner(optional) decides what to try next, and aRunExecutordecides how to actually run one cell. Each piece owns exactly one concern and knows nothing about the others.That separation is what makes the system extensible without churn. Want a new sweep shape? Add a discriminated-union variant to
SweepConfig—expand_sweepdoes the rest, and every executor, exporter, and analyzer picks it up for free. Want a new planner (a different acquisition function, a 1D SLA-saturation algorithm, a multi-fidelity scheme)? ImplementSearchPlanner.ask/telland register it under thesearch_plannerplugin category — the orchestrator and executors don’t change. Want to run the whole thing on Kubernetes? ImplementRunExecutor.executeto create anAIPerfJobCR and HTTP-pull its results instead of forking a subprocess (this is the coming-soonK8sChildJobExecutor) — the plan, orchestrator, planner, analyzer, and exporters are reused byte-for-byte. The progression from a single-shotaiperf profileto a cluster-distributed BO search isn’t a rewrite; it’s the same machinery at a different cardinality with a different executor at the bottom.
Execution
Today only LocalSubprocessExecutor ships: aiperf profile -f config.yaml runs the orchestrator in the same Python process and forks aiperf.orchestrator.subprocess_runner per cell.
Cluster execution (coming soon). K8sChildJobExecutor lives on the K8s integration branch (not main yet). It runs in-cluster in a sweep-controller pod (from an AIPerfSweep CR). Each cell becomes an AIPerfJob CR, watched to completion; the operator results server supplies the child export (same shape as local). Orchestrator logic is unchanged—only RunExecutor differs. CLI: aiperf kube sweep (alongside aiperf kube profile).
Key types
The whole flow uses about a dozen types. If you know these, you can read any sweep code.
End-to-end pipeline (canonical)
The orchestrator forks a subprocess per cell at stage 6; aggregation is pure post-hoc compute over the collected RunResults. YAML configs reach AIPerfConfig directly through load_config → AIPerfConfig.model_validate; only CLI flags travel through CLIConfig first so cyclopts can parse magic-list affordances (--concurrency 1,2,4) before they’re lifted into a typed SweepConfig.
What happens between runs (per-cell loop)
A “cell” is one (variation, trial) slot. Inside a cell, an ExecutionStrategy decides whether to keep going. FixedTrialsStrategy stops after M trials. AdaptiveStrategy (selected automatically when multi_run.convergence is set) keeps going until a ConvergenceCriterion is satisfied, capped by multi_run.num_runs. Around each executor.execute(run), the orchestrator threads cancel-checking, sweep-wide failure thresholds, and inter-run cooldowns. Two distinct cooldown fields are in play: multi_run.cooldown_seconds (between trials within a cell, returned by strategy.get_cooldown_seconds()) and sweep.cooldown_seconds (between variations, applied in the outer loop).
The strategy is fresh per cell in INDEPENDENT mode, so adaptive trial-convergence resets between variations. In REPEATED mode there’s only one trial per cell — the “outer trial loop” replays the whole grid.
REPEATED vs INDEPENDENT — loop nesting
Two ways to traverse the same N variations × M trials grid. sweep.iteration_order picks; default is REPEATED. The numbers below are the order in which cells execute (example: 3 variations, 3 trials).
REPEATED interleaves trials across variations so transient effects (warm caches, thermal drift) hit every variation similarly — better for cross-variation comparison. INDEPENDENT runs one variation to completion before moving on — required for convergence-based adaptive trials, since a strategy needs to observe all of one cell’s results in sequence. Cooldowns and per-cell strategy reuse follow from the nesting; see MultiRunOrchestrator.
Artifact directory layout reference
The artifact tree branches on three flags: whether a sweep is configured
(is_sweep), whether multiple trials run per cell (trials > 1), and
the sweep iteration order (REPEATED vs INDEPENDENT). Implemented in
_resolve_artifact_dir in src/aiperf/orchestrator/orchestrator.py.
<dir_name> is the {leaf_param_name}_{value} form (e.g.
concurrency_10, request_rate_5.0); multi-dim sweep cells join
components with __ (e.g. concurrency_10__isl_512). Inner-dir
naming is asymmetric on purpose — the no-sweep multi-run case uses
run_NNNN, the sweep + INDEPENDENT case uses trial_NNNN. Downstream
consumers (plotters, dashboards) account for this asymmetry.
The sweep-level aggregate path follows a parallel rule:
- REPEATED + multi-run:
<base>/aggregate/sweep_aggregate/ - everything else (sweep-only, sweep + INDEPENDENT):
<base>/sweep_aggregate/
Per-variation aggregates land at <base>/aggregate/<dir_name>/ in
REPEATED mode and <base>/<dir_name>/aggregate/ otherwise (INDEPENDENT
is the explicit default fallback in _per_variation_aggregate_dir; any
non-REPEATED mode takes the else branch).
Adaptive outer loop (ask / tell)
Adaptive search is the same pipeline with one swap: instead of “expand a fixed grid into N configs up front,” the planner generates configs one at a time, learning from each result.
- The sweep block is
AdaptiveSearchSweep(type: adaptive_search) instead ofGridSweep/ZipSweep/ScenarioSweep. BenchmarkPlan.configsstarts with one seed config; the planner extends it as it asks.MultiRunOrchestratordispatches toexecute_adaptive_search, which runsplanner.ask() -> execute trials -> planner.tell(results)untilplanner.ask()returnsNone(or cancellation / abort).- Four planner plugins ship:
BayesianSearchPlanner(curated Optuna+BoTorch preset; auto-selects qLogNEI / qLogNEHVI based on objective count),MonotonicSLASearchPlanner(1D probe + bisection),SmoothIsotonicSLAPlanner(isotonic regression on bootstrap-resampled trials),OptunaSearchPlanner(TPE / GP / BoTorch samplers, expert-mode flag exposure). TheBayesianSearchPlanneris implemented as a thin subclass ofOptunaSearchPlannerthat locks in the BoTorch sampler and the curated acquisition; it is not a separate engine. - Optional
search_recipeplugins build the wholeAdaptiveSearchSweepfrom a higher-level recipe (e.g.max-concurrency-under-sla,prefill-ttft-curve,pareto-sweep). - An optional
post_processhandler (degradation_knee_detect,ttft_curve_fit,itl_surface_fit,sla_breach_knee,pareto_sweep_export) runs after the final iteration.
Each iteration adds one SearchIteration to planner.history(). Convergence terminates the loop via planner.ask() returning None; the reason (plateau / improvement-patience / max-iterations) comes from planner.convergence_reason(). search_history.json is rewritten after every iteration so a crashed sweep still has a usable trail.
Fan-out math
The cardinality of any sweep is N variations × M trials = N×M cells. Where N and M come from depends on the path.
For adaptive search, N is the iteration count: bounded above by max_iterations, possibly less if the planner converges early. M (trials per iteration) still applies — adaptive runs M trials per planner-proposed point, then tell()s the planner the aggregate.
Where the code lives
For a fully-indexed file map covering every entry point, see Where to look in the code in Part 3.
Part 2 — Seven-stage tour
A guided tour of the sweep / multi-run / adaptive-search flow, focused on the big picture and the names of the types that move data between stages. Read this when you want to know what happens when you press enter.
The seven stages
Every aiperf profile invocation walks the same seven stages. The shape of each
stage’s input and output is named — those names are the things to remember.
The pipeline doesn’t change shape between a single benchmark, a multi-run, a grid sweep, a scenarios sweep, or a Bayesian search. Only how many cells stage 5 produces and what decides each next cell changes:
Each cell is one BenchmarkRun -> one RunResult. The next section unpacks
the N / M dimensions in detail.
Two dimensions: N variations × M trials
The sweep cardinality has two independent dimensions. Mixing them up is the single most common source of “wait, why did this run that many times?” surprise.
N comes from SweepConfig (the sweep block on AIPerfConfig):
the sweep.parameters cartesian product, the runs[] list, or the planner’s
proposals. Without a sweep block, N = 1.
M comes from MultiRunConfig (the multi-run block on AIPerfConfig):
When M > 1, SweepAnalyzer.compute automatically produces a confidence
block (mean / std / 95% CI) per metric per variation. When M = 1 you get
point estimates only.
Trials are not iterations. For an adaptive search, --search-max-iterations
controls N (how many points the planner gets to try), and --num-profile-runs
controls M (how many times each proposed point is benchmarked before the
planner sees the aggregate). They multiply: a max_iterations=30, num_profile_runs=3
adaptive run executes up to 90 subprocess benchmarks.
Stages 1–2 — User input -> typed config
Two entry points converge on the same typed envelope AIPerfConfig, but by
different paths. YAML skips CLIConfig entirely: load_config /
load_config_from_string parse the file and call AIPerfConfig.model_validate
directly. CLI flags are parsed by cyclopts into a CLIConfig (the
human-friendly, CLI-shaped surface), then convert_cli_to_aiperf lifts magic
flags into the typed envelope. From here on, AIPerfConfig is the single
source of truth.
Why the CLI -> envelope hop? CLIConfig is the human-friendly CLI shape — magic-lists like
--concurrency 1,2,4, --prefill-concurrency 1,2,4, or --request-rate 10,20,50 mean
“sweep that field over those values.” The converter lifts those affordances into a typed
sweep block on AIPerfConfig. After conversion, every flag has one canonical home in the
envelope. YAML configs don’t need this hop — they’re already written in envelope shape, so
load_config constructs AIPerfConfig directly via model_validate and skips CLIConfig
entirely.
SweepConfig is a discriminated union
Pydantic discriminates by a type field on the YAML / dump (each variant sets a
default for type, so YAML authors do not need to write it explicitly). The orchestrator
never inspects the variant directly — it reads BenchmarkPlan.is_adaptive_search,
which is true exactly when the variant is AdaptiveSearchSweep.
Stage 3 — Expand into a BenchmarkPlan
AIPerfConfig describes intent; BenchmarkPlan lists the actual cells the
orchestrator will run. The plan-builder either short-circuits to a single seed
variation (for adaptive runs and for no-sweep runs) or calls expand_sweep
(cartesian product for grid, lockstep zip for zip, deep-merge for scenarios —
also Sobol / Latin-hypercube for QMC sweeps), then renders any per-variation
Jinja and emits one BenchmarkConfig per variation.
A few useful invariants:
SweepVariation—{index, label, values}. One per variation.valuesis the dict of swept parameters that differ from the base config; the label is built from those for artifact directory names.trials = Mcomes fromMultiRunConfig.num_runs(default1, max10). It’s the per-cell repeat count for confidence aggregation, not the total run count.- For adaptive search,
configsstarts with one seed and grows as the planner asks. The plan-builder doesn’t know the final length up front. plan.is_adaptive_searchis the orchestrator’s only branch on the sweep variant — every other piece of code is variant-agnostic.
Stages 4–5 — Orchestrator dispatch
MultiRunOrchestrator.execute(plan, executor, search_planner=...) is the single
entry point. It dispatches on plan.is_adaptive_search:
Grid / scenarios — REPEATED vs INDEPENDENT
For an N×M grid (N variations, M trials), there are two ways to interleave the
work. Both produce the same N×M cells; they differ only in which loop is outer.
iteration_order is a field on the grid family of sweeps (GridSweep, ZipSweep,
ScenarioSweep); AdaptiveSearchSweep does not expose this knob.
REPEATED is the default. It interleaves so transient effects (warm caches,
thermal drift) hit every variation similarly — better for cross-variation
comparison. INDEPENDENT runs one variation to completion before moving on;
required when each variation needs its own ExecutionStrategy to observe a full
cell’s worth of results before deciding to stop (the adaptive trial-convergence
case).
Adaptive — ask / tell loop
When plan.is_adaptive_search is true, execute_adaptive_search runs a tighter
loop driven by a SearchPlanner:
Stage 6 — Inside one cell
A “cell” is one (variation, trial) slot. Every cell runs a small state machine
driven by an ExecutionStrategy:
Three collaborators inside the cell — two ABCs and one Pydantic model:
FixedTrialsStrategy runs exactly M trials. AdaptiveStrategy runs until a
ConvergenceConfig says enough — capped by multi_run.num_runs so it can’t run
forever.
Stage 7 — Aggregate
After the orchestrator returns list[RunResult], the CLI runner groups by
RunResult.variation_values, builds a per_combination_stats dict, and hands
it to SweepAnalyzer.compute(per_combination_stats, sweep_parameters, sla_filters=…),
which computes summary stats per group, identifies the Pareto frontier, and
returns the aggregate dict the JSON / CSV exporters write.
The aggregate JSON has three result blocks plus a metadata block:
metadata—num_combinations, swept parameter list, and (when set)sla_constraints. Downstream consumers key off this block.per_combination_metrics— one entry per uniquevariation_values, with swept parameters and a metric block (mean / p99 / etc.) for every metric.best_configurations— fixed post-hoc picks for highest throughput and lowest latency from the aggregate summary. These are not the adaptive search’s configured objectives.pareto_optimal— fixed post-hoc throughput/latency frontier computed via_dominates. Adaptive configured objectives are reported insearch_history.json["best_trials"].
Orthogonality note.
best_configurationsandpareto_optimalhere are emitted bySweepAnalyzer, computed across the wholeRunResultset, and live undersweep_aggregate/profile_export_aiperf_sweep.json. They are distinct fromsearch_history.json["best_trials"], which is what the BO planner converged on (see Search History API). For a single-objective adaptive run with no failed iterations the two usually agree on the winner; they can disagree when iterations failed, when feasibility differs (search-history is feasibility-first lex oversla_filters, the analyzer ranks the full set), or when the analyzer’s Pareto computation includes objectives the planner wasn’t optimizing.
If the active sweep came from a search recipe with a PostProcessSpec, that
handler runs after the analyzer and emits its own JSON file (e.g.
degradation_knee.json for concurrency-ramp, pareto_sweep.json for
pareto-sweep).
How search recipes plug in
A search recipe is a named preset that bundles “search space + objective +
termination + SLA filters + optional post-process” into one CLI selector
(--search-recipe <name>). It runs before stage 3 and emits the typed sweep
config the rest of the pipeline expects.
SearchRecipeContext is the recipe’s read-only view of user intent — built
BenchmarkConfig, declared SLA targets (--ttft-sla-ms, etc.), and any
sweep-knob overrides (--concurrency-min, --isl-osl-pairs, etc.).
SearchRecipeOutput carries exactly one of adaptive_search,
sweep_parameters, or scenarios (validated mutually exclusive), plus optional
sla_filters, per-request slos, and a post_process spec.
The eight built-in recipes
After expansion, downstream stages don’t know a recipe ever existed — they just
see a normal AIPerfConfig.sweep with optional sla_filters attached.
End-to-end — putting it all together
One diagram from key-press to artifact:
Names worth remembering
If you remember nothing else from this doc, remember these eleven names — every other class in the sweep code is glue or helper.
For adaptive runs, three more:
Part 3 — Class & module map
End-to-end view of how a YAML config or CLI invocation becomes an AIPerfConfig envelope, expands into a BenchmarkPlan, and is executed by MultiRunOrchestrator against a backend RunExecutor. Multiple zoom levels — pick whichever matches what you’re trying to understand.
The same BenchmarkPlan / MultiRunOrchestrator / RunExecutor machinery handles single-run, grid sweep, zip sweep, scenario sweep, and adaptive search. Dispatch differs only inside MultiRunOrchestrator.execute.
30,000 ft — what happens, period
10,000 ft — local end-to-end (with cluster path coming soon)
Sub-flow — config layer (YAML/CLI -> BenchmarkPlan)
Sub-flow — orchestrator iteration
cli_runner.run_benchmark peels off single-run plans (plan.is_single_run) before the orchestrator is constructed; only multi-run plans reach MultiRunOrchestrator.execute. Inside execute(), dispatch is two-way: adaptive-search vs. grid/scenarios. Grid/scenarios further branch on _plan_iteration_order(plan) which reads plan.sweep.iteration_order (REPEATED default, or INDEPENDENT).
(The artifact-tree layout table is documented above in Part 1 — Artifact directory layout reference.)
Sub-flow — RunExecutor backends
RunExecutor is a 2-method ABC: execute(run) -> RunResult and derive_id(plan, var_idx, trial) -> str. The local executor derives a stable id from the plan/variation/trial tuple for artifact naming; the cluster executor (coming soon — finalized on the K8s integration branch, not yet on main) derives a deterministic K8s-name-safe id from (plan, var_idx, trial) so child AIPerfJob creation is idempotent.
The RunResult shape returned by both backends is identical — the cluster path fetches the same profile_export_aiperf.json schema over HTTP that the local path reads off disk. Downstream SweepAnalyzer.compute(), aggregate_and_export(), and the search_history.json writer don’t know which backend produced the inputs.
Class / module map
Sequence — a sweep run end to end
Where to look in the code
ABC hierarchy — orchestrator-side
The orchestrator layer’s extension points are abstract base classes; implementations are registered as plugins or instantiated directly by category-aware factories.
Sweep execution flow — class module map in motion
How the types from the class diagram actually flow through a sweep run. Read it as: each box is an instance of a class from the class diagram; arrows show what produces what; cardinality annotations make the fan-out explicit (1 plan -> N variations × M trials -> N×M results -> 1 aggregate).
The two views together: the flowchart shows cardinality and which class produces which (the data shape of a sweep); the sequence shows the temporal call pattern between the same classes. Both use only the types from the class diagram — no module-internal helpers.
Adaptive search — class types
The adaptive search path layers atop the same BenchmarkPlan / MultiRunOrchestrator / RunExecutor core. Adaptive config is not a separate field — it’s the AdaptiveSearchSweep variant of the SweepConfig discriminated union (type: adaptive_search). Two plugin categories cooperate: a search_planner (drives the outer loop) and an optional search_recipe (curates the search space / objective / post-process from a higher-level recipe template). The optional terminal post_process is a single PostProcessSpec resolved via search_recipe_post_process plugins.
Built-in search_recipe plugins (src/aiperf/search_recipes/):
max-throughput-ttft-sla,max-throughput-itl-slaconcurrency-rampprefill-ttft-curve,decode-itl-curvemax-goodput-under-slo,max-concurrency-under-slapareto-sweep
Recipes choose one of three output branches: adaptive_search (BO-style), sweep_parameters (grid-style — e.g. concurrency-ramp, prefill-ttft-curve, decode-itl-curve), or scenarios (deep-merge variants — e.g. pareto-sweep). The SearchRecipeOutput validator enforces exactly-one-of, so downstream code can branch cleanly.
Built-in search_recipe_post_process plugins: degradation_knee_detect, ttft_curve_fit, itl_surface_fit, sla_breach_knee, pareto_sweep_export.
Adaptive search — execution flow
The BO outer loop is a propose -> execute -> record cycle inside MultiRunOrchestrator.execute_adaptive_search. BenchmarkRun and RunExecutor are the same as in the grid path; the difference is that BenchmarkPlan.configs starts with one seed config and grows by one per iteration as the planner asks for the next point.
Adaptive search — recipe -> AdaptiveSearchSweep
A user can either author an AdaptiveSearchSweep directly under sweep: (low level) or pick a search_recipe plugin (high level) that builds one from a recipe + the user’s existing benchmark config. The adaptive block lives entirely on sweep; there is no separate adaptive-search field on MultiRunConfig.
SearchPlanner — protocols, planners, and extension points
How AIPerf’s Bayesian-Optimization outer loop is wired together: the protocols, the runtime sequence, and the config-to-execution flow.
The planner and the orchestrator talk through narrow protocols. MultiRunOrchestrator doesn’t know about Bayesian Optimization — it only knows SearchPlanner and RunExecutor. The Optuna+BoTorch dependency is hidden inside OptunaSearchPlanner and its BayesianSearchPlanner curated-preset subclass; MonotonicSLASearchPlanner and SmoothIsotonicSLAPlanner are 1D-feasibility-search planners that plug in at the same SearchPlanner ABC. Future planners (random-search baseline, MORBO, etc.) plug in identically.
Registered planners
Planner modules below are relative to aiperf.orchestrator.search_planner. (e.g. bayesian.py -> aiperf.orchestrator.search_planner.bayesian).
All four are registered in src/aiperf/plugin/plugins.yaml under the search_planner: category and resolved via plugins.get_class(PluginType.SEARCH_PLANNER, name).
SearchPlanner class diagram
The CLI grammar lives in aiperf.orchestrator.search_planner.parsing.parse_search_space(values), which converts --search-space "path:lo,hi[:kind]" strings into SearchSpaceDimension instances. The v1->v2 converter (build_multi_run in aiperf.config.flags._converter_optionals) packages everything into a typed AdaptiveSearchSweep carried on AIPerfConfig.sweep.
Runtime sequence — one BO iteration
MultiRunOrchestrator.execute_adaptive_search is a thin loop. Every iteration: ask the planner for a (BenchmarkConfig, SweepVariation); run all configured trials at that point via the same _run_independent_cell grid sweeps use; tell the planner what happened; write search_history.json incrementally. When ask() returns None, surface the planner’s convergence_reason() and exit.
A few things this view makes explicit:
- Aggregate observations to the GP. The planner currently reports one Optuna trial per search point. With
objective_pooling=meanit tells Optuna the mean of finite per-trial objective values; with pooled percentile mode it tells Optuna the pooled percentile objective computed from the raw record samples. Per-trialRunResultobjects remain onSearchIteration.resultsin memory for search-history derivation, but separate per-trial Optuna observations are not recorded in v1. - Failed-iteration handling. When zero trials produce a usable objective, the planner still calls
study.tell(trial, fallback_objective)so Optuna’s ask/tell pairing stays consistent. The fallback is a strictly worse-than-prior sentinel used only inside the study;search_history.jsonpersistsobjective_values: nullfor that iteration. - Three convergence signals.
is_converged()checks max_iterations, then improvement-over-best patience, then coefficient-of-variation plateau. The first to fire wins; the reason is recorded insearch_history.json.
Config flow — CLI / YAML -> execution
The CLI feeds a CLIConfig through src/aiperf/config/flags/converter.py, which packages the search-space + objective + filters into a typed AdaptiveSearchSweep carried on AIPerfConfig.sweep. From there the plan builder produces a BenchmarkPlan, and MultiRunOrchestrator.execute dispatches on plan.is_adaptive_search to the BO loop.
Notes on extension points
- Adding a new planner backend (random-search baseline, etc.): subclass
SearchPlanner, implement the four abstract methods (ask/tell/is_converged/history); optionally overrideconvergence_reason(default returnsNone) andboundary_summary(1D feasibility planners only). No orchestrator changes required —MonotonicSLASearchPlanner,SmoothIsotonicSLAPlanner, andOptunaSearchPlannerare existing examples reusing the same ABC. Wiring is already generic:AdaptiveSearchSweep.planneris aSearchPlannerType(ExtensibleStrEnuminsrc/aiperf/plugin/enums.pyi), andcli_runner._run_multi_benchmarkinstantiates the planner viaplugins.get_class(PluginType.SEARCH_PLANNER, sweep.planner). To register a new backend, add an entry undersearch_planner:insrc/aiperf/plugin/plugins.yamland a matching enum value — no dispatch code changes. - Adding a new executor backend: subclass
RunExecutor.LocalSubprocessExecutoriterates one (variation, trial) at a time viaexecute(BenchmarkRun)— the seam is adaptive-shaped by construction. - Replacing the BO backend. The
OptunaSearchPlannerboundary (and itsBayesianSearchPlannercurated-preset subclass) is the only Optuna-aware code in the project. BoTorch-specific acquisitions live behind the optionalbotorchextra inpyproject.toml. Theqlognei/qlognehviacquisitions, posterior-regret stopping (--optuna-terminator regret/emmr), pooled-percentile aggregation (--search-percentile-pooling pooled), and the Hvarfner-DSP Matern-5/2 kernel (arXiv:2402.02229) are all plumbed through_optuna_helpers.py. The remaining principled upgrade path is wiring per-iteration heteroscedastic noise estimates from the pooled-percentile JSONL helper into aHeteroskedasticSingleTaskGP-based customcandidates_func. Evidence-gated: ship only if observed within-trial variance varies meaningfully across the search space on real workloads.
smooth_isotonic as novel-in-composition
The SmoothIsotonicSLAPlanner algorithm (PAVA monotonic regression as denoiser -> PCHIP cubic Hermite interpolant -> root-find for SLA-threshold crossing, plus a PAVA-residual changepoint detector for the cliff-guard exit) does not appear in published BO literature in this exact composition. The components are textbook (PAVA: pool-adjacent-violators; PCHIP: shape-preserving piecewise-cubic Hermite interpolation; bracketed root-find: classical numerical analysis), but their composition for SLA-saturation in noisy GPU-serving benchmarking is original. Adjacent prior art:
- Letham et al. 2017 (arXiv:1706.07094) — the noise-modeling anchor; per-trial-observations in BO with feasibility-product constraints.
- DistServe (Zhong et al. OSDI ‘24, arXiv:2401.09670) — “DistServe simply enumerates the placements via binary search and finds the maximum rate that meets the SLO attainment target with simulation trials.”
MonotonicSLASearchPlannerreproduces DistServe’s algorithm;SmoothIsotonicSLAPlanneris a strict improvement (denoised + continuous-space root-find). - BOute (Jiang et al. 2026, arXiv:2602.10729) — closest contemporary work using BO for LLM serving; constrained
qNEHVIon BoTorch withModelListGP. Different problem (serving-system optimization rather than benchmark-side adaptive sweep), same machinery family.
smooth_isotonic is defensibly novel in the systems-benchmarking literature even though every individual piece is classical statistics; worth a section in a future technical report.