Sweep Orchestrator (Dev Reference) | NVIDIA AIPerf Documentation

Developer reference for AIPerf’s sweep + adaptive-search machinery. Three zoom levels below: mental model -> seven-stage tour -> class/module map.

Part 1 — Mental model

The big idea

Every AIPerf run — single benchmark, parameter grid, or Bayesian search — is the same pipeline with different cardinalities:

One pipeline, every scale — by design. A single benchmark, a local multi-run for confidence intervals, a grid or scenarios sweep, a Sobol or Latin-Hypercube characterization, a local Bayesian-Optimization adaptive search, and a cluster-side BO running across hundreds of pods are not seven different code paths. They are seven cardinalities of one pipeline: BenchmarkPlan describes what to run, MultiRunOrchestrator decides when and in what order, a SearchPlanner (optional) decides what to try next, and a RunExecutor decides how to actually run one cell. Each piece owns exactly one concern and knows nothing about the others.

That separation is what makes the system extensible without churn. Want a new sweep shape? Add a discriminated-union variant to SweepConfig — expand_sweep does the rest, and every executor, exporter, and analyzer picks it up for free. Want a new planner (a different acquisition function, a 1D SLA-saturation algorithm, a multi-fidelity scheme)? Implement SearchPlanner.ask/tell and register it under the search_planner plugin category — the orchestrator and executors don’t change. Want to run the whole thing on Kubernetes? Implement RunExecutor.execute to create an AIPerfJob CR and HTTP-pull its results instead of forking a subprocess (this is the coming-soon K8sChildJobExecutor) — the plan, orchestrator, planner, analyzer, and exporters are reused byte-for-byte. The progression from a single-shot aiperf profile to a cluster-distributed BO search isn’t a rewrite; it’s the same machinery at a different cardinality with a different executor at the bottom.

Execution

Today only LocalSubprocessExecutor ships: aiperf profile -f config.yaml runs the orchestrator in the same Python process and forks aiperf.orchestrator.subprocess_runner per cell.

Cluster execution (coming soon). K8sChildJobExecutor lives on the K8s integration branch (not main yet). It runs in-cluster in a sweep-controller pod (from an AIPerfSweep CR). Each cell becomes an AIPerfJob CR, watched to completion; the operator results server supplies the child export (same shape as local). Orchestrator logic is unchanged—only RunExecutor differs. CLI: aiperf kube sweep (alongside aiperf kube profile).

Key types

The whole flow uses about a dozen types. If you know these, you can read any sweep code.

Type	Role
`AIPerfConfig`	Top-level envelope. Holds a `BenchmarkConfig` body plus envelope-level knobs: `sweep`, `multi_run`, `variables`, `random_seed`.
`BenchmarkConfig`	The actual benchmark settings (models, endpoint, datasets, phases, artifacts, …). The unit of “what to benchmark.”
`SweepConfig`	Discriminated union: `GridSweep` (YAML: `type: grid`, cartesian over `parameters`), `ZipSweep` (`type: zip`, lockstep / element-wise over `parameters`, all lists equal length), `ScenarioSweep` (`type: scenarios`, deep-merge `runs[i]`), or `AdaptiveSearchSweep` (`type: adaptive_search`, BO / monotonic).
`SobolSweep` / `LatinHypercubeSweep`	Fixed-budget space-filling samplers. N = `samples`; each variation drawn from `scipy.stats.qmc`. Reuses the grid-style `iteration_order` / `cooldown` / SLA-filter mechanics.
`MultiRunConfig`	Trial mechanics: `num_runs` (= trials per variation), cooldown, optional `convergence: ConvergenceConfig`.
`SweepVariation`	`{index, label, values}`. One per variation; carries the parameter values that differ from base. Also exposes `dir_name`: the `{leaf}_{value}` form (e.g. `concurrency_10`) used as the per-variation directory name.
`BenchmarkPlan`	The “expanded” form: `configs[N]`, `variations[N]`, `trials=M`, plus the originating `sweep` + `multi_run`. Output of `build_benchmark_plan`.
`BenchmarkRun`	One cell: `(cfg, variation, trial, artifact_dir)`. The smallest unit of work.
`RunResult`	`{success, summary_metrics, artifacts_path, variation_label, variation_values, trial_index, error}`. One per `BenchmarkRun`.
`MultiRunOrchestrator`	Drives the N×M loop. Picks REPEATED (trials outer) or INDEPENDENT (variations outer) based on `sweep.iteration_order`; dispatches to `execute_adaptive_search` if the sweep is adaptive.
`RunExecutor`	ABC with `execute(run) -> RunResult` plus a second abstract `derive_id(plan, var_idx, trial) -> str` for stable per-cell identifiers. `LocalSubprocessExecutor` is the only shipping implementation today; `K8sChildJobExecutor` (one child `AIPerfJob` CR per call) is finalized but unmerged — see Execution above.
`SweepAnalyzer`	Post-hoc aggregator. CLI helpers group `list[RunResult]` by `variation_values` into `per_combination_stats`; `SweepAnalyzer.compute()` then produces `best_configurations`, `pareto_optimal`, `per_combination_metrics`. Written to `sweep_aggregate/profile_export_aiperf_sweep.{json,csv}`.

End-to-end pipeline (canonical)

The orchestrator forks a subprocess per cell at stage 6; aggregation is pure post-hoc compute over the collected RunResults. YAML configs reach AIPerfConfig directly through load_config → AIPerfConfig.model_validate; only CLI flags travel through CLIConfig first so cyclopts can parse magic-list affordances (--concurrency 1,2,4) before they’re lifted into a typed SweepConfig.

What happens between runs (per-cell loop)

A “cell” is one (variation, trial) slot. Inside a cell, an ExecutionStrategy decides whether to keep going. FixedTrialsStrategy stops after M trials. AdaptiveStrategy (selected automatically when multi_run.convergence is set) keeps going until a ConvergenceCriterion is satisfied, capped by multi_run.num_runs. Around each executor.execute(run), the orchestrator threads cancel-checking, sweep-wide failure thresholds, and inter-run cooldowns. Two distinct cooldown fields are in play: multi_run.cooldown_seconds (between trials within a cell, returned by strategy.get_cooldown_seconds()) and sweep.cooldown_seconds (between variations, applied in the outer loop).

The strategy is fresh per cell in INDEPENDENT mode, so adaptive trial-convergence resets between variations. In REPEATED mode there’s only one trial per cell — the “outer trial loop” replays the whole grid.

REPEATED vs INDEPENDENT — loop nesting

Two ways to traverse the same N variations × M trials grid. sweep.iteration_order picks; default is REPEATED. The numbers below are the order in which cells execute (example: 3 variations, 3 trials).

REPEATED interleaves trials across variations so transient effects (warm caches, thermal drift) hit every variation similarly — better for cross-variation comparison. INDEPENDENT runs one variation to completion before moving on — required for convergence-based adaptive trials, since a strategy needs to observe all of one cell’s results in sequence. Cooldowns and per-cell strategy reuse follow from the nesting; see MultiRunOrchestrator.

Artifact directory layout reference

The artifact tree branches on three flags: whether a sweep is configured (is_sweep), whether multiple trials run per cell (trials > 1), and the sweep iteration order (REPEATED vs INDEPENDENT). Implemented in _resolve_artifact_dir in src/aiperf/orchestrator/orchestrator.py.

sweep	trials	order	layout
no	1	-	`<base>/`
no	>1	-	`<base>/profile_runs/run_NNNN/`
yes	1	-	`<base>/<dir_name>/`
yes	>1	REPEATED	`<base>/profile_runs/trial_NNNN/<dir_name>/`
yes	>1	INDEPENDENT	`<base>/<dir_name>/profile_runs/trial_NNNN/`
adaptive	any	-	`<base>/search_iter_NNNN/profile_runs/run_NNNN/`

<dir_name> is the {leaf_param_name}_{value} form (e.g. concurrency_10, request_rate_5.0); multi-dim sweep cells join components with __ (e.g. concurrency_10__isl_512). Inner-dir naming is asymmetric on purpose — the no-sweep multi-run case uses run_NNNN, the sweep + INDEPENDENT case uses trial_NNNN. Downstream consumers (plotters, dashboards) account for this asymmetry.

The sweep-level aggregate path follows a parallel rule:

REPEATED + multi-run: <base>/aggregate/sweep_aggregate/
everything else (sweep-only, sweep + INDEPENDENT): <base>/sweep_aggregate/

Per-variation aggregates land at <base>/aggregate/<dir_name>/ in REPEATED mode and <base>/<dir_name>/aggregate/ otherwise (INDEPENDENT is the explicit default fallback in _per_variation_aggregate_dir; any non-REPEATED mode takes the else branch).

Adaptive outer loop (ask / tell)

Adaptive search is the same pipeline with one swap: instead of “expand a fixed grid into N configs up front,” the planner generates configs one at a time, learning from each result.

The sweep block is AdaptiveSearchSweep (type: adaptive_search) instead of GridSweep / ZipSweep / ScenarioSweep.
BenchmarkPlan.configs starts with one seed config; the planner extends it as it asks.
MultiRunOrchestrator dispatches to execute_adaptive_search, which runs planner.ask() -> execute trials -> planner.tell(results) until planner.ask() returns None (or cancellation / abort).
Four planner plugins ship: BayesianSearchPlanner (curated Optuna+BoTorch preset; auto-selects qLogNEI / qLogNEHVI based on objective count), MonotonicSLASearchPlanner (1D probe + bisection), SmoothIsotonicSLAPlanner (isotonic regression on bootstrap-resampled trials), OptunaSearchPlanner (TPE / GP / BoTorch samplers, expert-mode flag exposure). The BayesianSearchPlanner is implemented as a thin subclass of OptunaSearchPlanner that locks in the BoTorch sampler and the curated acquisition; it is not a separate engine.
Optional search_recipe plugins build the whole AdaptiveSearchSweep from a higher-level recipe (e.g. max-concurrency-under-sla, prefill-ttft-curve, pareto-sweep).
An optional post_process handler (degradation_knee_detect, ttft_curve_fit, itl_surface_fit, sla_breach_knee, pareto_sweep_export) runs after the final iteration.

Each iteration adds one SearchIteration to planner.history(). Convergence terminates the loop via planner.ask() returning None; the reason (plateau / improvement-patience / max-iterations) comes from planner.convergence_reason(). search_history.json is rewritten after every iteration so a crashed sweep still has a usable trail.

Fan-out math

The cardinality of any sweep is N variations × M trials = N×M cells. Where N and M come from depends on the path.

For adaptive search, N is the iteration count: bounded above by max_iterations, possibly less if the planner converges early. M (trials per iteration) still applies — adaptive runs M trials per planner-proposed point, then tell()s the planner the aggregate.

Where the code lives

Concept	File
Envelope, body	`src/aiperf/config/config.py` (`AIPerfConfig`, `BenchmarkConfig`)
Multi-run / convergence	`src/aiperf/config/sweep/multi_run.py`
Sweep variants	`src/aiperf/config/sweep/config.py` + `sampling.py` (QMC) + `adaptive.py`
Sweep expansion	`src/aiperf/config/sweep/expand.py` + `expand_qmc.py`
Plan loader (CLI/YAML -> plan)	`src/aiperf/config/loader/plan.py`
`BenchmarkPlan` / `BenchmarkRun` models	`src/aiperf/config/resolution/plan.py`
Orchestrator	`src/aiperf/orchestrator/orchestrator.py`
Executors	`src/aiperf/orchestrator/{executor,local_executor}.py`
Aggregation	`src/aiperf/orchestrator/aggregation/sweep.py`
Search planners + recipes	`src/aiperf/orchestrator/search_planner/`, `src/aiperf/search_recipes/`

For a fully-indexed file map covering every entry point, see Where to look in the code in Part 3.

Part 2 — Seven-stage tour

A guided tour of the sweep / multi-run / adaptive-search flow, focused on the big picture and the names of the types that move data between stages. Read this when you want to know what happens when you press enter.

The seven stages

Every aiperf profile invocation walks the same seven stages. The shape of each stage’s input and output is named — those names are the things to remember.

The pipeline doesn’t change shape between a single benchmark, a multi-run, a grid sweep, a scenarios sweep, or a Bayesian search. Only how many cells stage 5 produces and what decides each next cell changes:

Mode	N (variations)	M (trials per variation)	Total cells
Single benchmark	1	1	1
Multi-run	1	`MultiRunConfig.num_runs` (1–10)	M
Grid sweep	cartesian product of `sweep.parameters`	`MultiRunConfig.num_runs` (default 1)	N × M
Scenarios sweep	`len(runs[])`	`MultiRunConfig.num_runs` (default 1)	N × M
Adaptive search	grows by 1 every `planner.ask()`; capped by `max_iterations`	`MultiRunConfig.num_runs` (default 1)	≤ max_iter × M

Each cell is one BenchmarkRun -> one RunResult. The next section unpacks the N / M dimensions in detail.

Two dimensions: N variations × M trials

The sweep cardinality has two independent dimensions. Mixing them up is the single most common source of “wait, why did this run that many times?” surprise.

N comes from SweepConfig (the sweep block on AIPerfConfig): the sweep.parameters cartesian product, the runs[] list, or the planner’s proposals. Without a sweep block, N = 1.

M comes from MultiRunConfig (the multi-run block on AIPerfConfig):

Field	Default	Meaning
`num_runs`	`1`	Trial count per variation. `M = 1` is “single run, no repeats.” Max `10` (both CLI flag and typed field share the cap).
`cooldown_seconds`	`0`	Sleep between trials so server caches / thermals reset.
`convergence`	unset	Optional `ConvergenceConfig` — stop early when results stabilize.

Total runs = N × M
  N=1, M=1   ->  1 run     single benchmark, no confidence
  N=1, M=5   ->  5 runs    one config, repeated for confidence intervals
  N=4, M=1   ->  4 runs    a 4-point sweep, one shot per point
  N=4, M=3   ->  12 runs   sweep with 3-trial confidence per variation

When M > 1, SweepAnalyzer.compute automatically produces a confidence block (mean / std / 95% CI) per metric per variation. When M = 1 you get point estimates only.

Trials are not iterations. For an adaptive search, --search-max-iterations controls N (how many points the planner gets to try), and --num-profile-runs controls M (how many times each proposed point is benchmarked before the planner sees the aggregate). They multiply: a max_iterations=30, num_profile_runs=3 adaptive run executes up to 90 subprocess benchmarks.

Stages 1–2 — User input -> typed config

Two entry points converge on the same typed envelope AIPerfConfig, but by different paths. YAML skips CLIConfig entirely: load_config / load_config_from_string parse the file and call AIPerfConfig.model_validate directly. CLI flags are parsed by cyclopts into a CLIConfig (the human-friendly, CLI-shaped surface), then convert_cli_to_aiperf lifts magic flags into the typed envelope. From here on, AIPerfConfig is the single source of truth.

Why the CLI -> envelope hop? CLIConfig is the human-friendly CLI shape — magic-lists like --concurrency 1,2,4, --prefill-concurrency 1,2,4, or --request-rate 10,20,50 mean “sweep that field over those values.” The converter lifts those affordances into a typed sweep block on AIPerfConfig. After conversion, every flag has one canonical home in the envelope. YAML configs don’t need this hop — they’re already written in envelope shape, so load_config constructs AIPerfConfig directly via model_validate and skips CLIConfig entirely.

`SweepConfig` is a discriminated union

Pydantic discriminates by a type field on the YAML / dump (each variant sets a default for type, so YAML authors do not need to write it explicitly). The orchestrator never inspects the variant directly — it reads BenchmarkPlan.is_adaptive_search, which is true exactly when the variant is AdaptiveSearchSweep.

Stage 3 — Expand into a `BenchmarkPlan`

AIPerfConfig describes intent; BenchmarkPlan lists the actual cells the orchestrator will run. The plan-builder either short-circuits to a single seed variation (for adaptive runs and for no-sweep runs) or calls expand_sweep (cartesian product for grid, lockstep zip for zip, deep-merge for scenarios — also Sobol / Latin-hypercube for QMC sweeps), then renders any per-variation Jinja and emits one BenchmarkConfig per variation.

A few useful invariants:

SweepVariation — {index, label, values}. One per variation. values is the dict of swept parameters that differ from the base config; the label is built from those for artifact directory names.
trials = M comes from MultiRunConfig.num_runs (default 1, max 10). It’s the per-cell repeat count for confidence aggregation, not the total run count.
For adaptive search, configs starts with one seed and grows as the planner asks. The plan-builder doesn’t know the final length up front.
plan.is_adaptive_search is the orchestrator’s only branch on the sweep variant — every other piece of code is variant-agnostic.

Stages 4–5 — Orchestrator dispatch

MultiRunOrchestrator.execute(plan, executor, search_planner=...) is the single entry point. It dispatches on plan.is_adaptive_search:

Grid / scenarios — REPEATED vs INDEPENDENT

For an N×M grid (N variations, M trials), there are two ways to interleave the work. Both produce the same N×M cells; they differ only in which loop is outer. iteration_order is a field on the grid family of sweeps (GridSweep, ZipSweep, ScenarioSweep); AdaptiveSearchSweep does not expose this knob.

REPEATED is the default. It interleaves so transient effects (warm caches, thermal drift) hit every variation similarly — better for cross-variation comparison. INDEPENDENT runs one variation to completion before moving on; required when each variation needs its own ExecutionStrategy to observe a full cell’s worth of results before deciding to stop (the adaptive trial-convergence case).

Adaptive — `ask` / `tell` loop

When plan.is_adaptive_search is true, execute_adaptive_search runs a tighter loop driven by a SearchPlanner:

Step	What actually happens
`planner.ask()`	returns the next `(BenchmarkConfig, SweepVariation)` — or `None` to terminate. State: a Gaussian process (`BayesianSearchPlanner`), a bisection bracket (`MonotonicSLASearchPlanner`), an isotonic-fit history (`SmoothIsotonicSLAPlanner`), or an Optuna study (`OptunaSearchPlanner`).
`_run_independent_cell`	runs M trials at the proposed point — the same per-cell loop INDEPENDENT mode uses.
`planner.tell(...)`	feeds the M-trial aggregate back so the planner can update its model and propose a better next point.
`planner.is_converged()`	checked inside `ask()`. When max-iter / plateau / improvement-patience fires, `ask()` returns `None`.
`search_history.json`	rewritten after every iteration. A crashed run still has a usable trail.

Stage 6 — Inside one cell

A “cell” is one (variation, trial) slot. Every cell runs a small state machine driven by an ExecutionStrategy:

Three collaborators inside the cell — two ABCs and one Pydantic model:

Type	Implementations	Job
`ExecutionStrategy` (ABC)	`FixedTrialsStrategy`, `AdaptiveStrategy`	Decide whether to run another trial in this cell.
`RunExecutor` (ABC)	`LocalSubprocessExecutor` (only one shipping)	Turn one `BenchmarkRun` into one `RunResult` by spawning a fresh subprocess of `aiperf.orchestrator.subprocess_runner`.
`BenchmarkRun` (Pydantic model)	—	The smallest unit of work — essentially `(cfg, variation, trial, artifact_dir)`, plus identity fields (`benchmark_id`, `sweep_id`, `label`, `cli_command`, `random_seed`) that the orchestrator uses for the artifact tree and sweep grouping.

FixedTrialsStrategy runs exactly M trials. AdaptiveStrategy runs until a ConvergenceConfig says enough — capped by multi_run.num_runs so it can’t run forever.

Stage 7 — Aggregate

After the orchestrator returns list[RunResult], the CLI runner groups by RunResult.variation_values, builds a per_combination_stats dict, and hands it to SweepAnalyzer.compute(per_combination_stats, sweep_parameters, sla_filters=…), which computes summary stats per group, identifies the Pareto frontier, and returns the aggregate dict the JSON / CSV exporters write.

The aggregate JSON has three result blocks plus a metadata block:

metadata — num_combinations, swept parameter list, and (when set) sla_constraints. Downstream consumers key off this block.
per_combination_metrics — one entry per unique variation_values, with swept parameters and a metric block (mean / p99 / etc.) for every metric.
best_configurations — fixed post-hoc picks for highest throughput and lowest latency from the aggregate summary. These are not the adaptive search’s configured objectives.
pareto_optimal — fixed post-hoc throughput/latency frontier computed via _dominates. Adaptive configured objectives are reported in search_history.json["best_trials"].

Orthogonality note. best_configurations and pareto_optimal here are emitted by SweepAnalyzer, computed across the whole RunResult set, and live under sweep_aggregate/profile_export_aiperf_sweep.json. They are distinct from search_history.json["best_trials"], which is what the BO planner converged on (see Search History API). For a single-objective adaptive run with no failed iterations the two usually agree on the winner; they can disagree when iterations failed, when feasibility differs (search-history is feasibility-first lex over sla_filters, the analyzer ranks the full set), or when the analyzer’s Pareto computation includes objectives the planner wasn’t optimizing.

If the active sweep came from a search recipe with a PostProcessSpec, that handler runs after the analyzer and emits its own JSON file (e.g. degradation_knee.json for concurrency-ramp, pareto_sweep.json for pareto-sweep).

How search recipes plug in

A search recipe is a named preset that bundles “search space + objective + termination + SLA filters + optional post-process” into one CLI selector (--search-recipe <name>). It runs before stage 3 and emits the typed sweep config the rest of the pipeline expects.

SearchRecipeContext is the recipe’s read-only view of user intent — built BenchmarkConfig, declared SLA targets (--ttft-sla-ms, etc.), and any sweep-knob overrides (--concurrency-min, --isl-osl-pairs, etc.).

SearchRecipeOutput carries exactly one of adaptive_search, sweep_parameters, or scenarios (validated mutually exclusive), plus optional sla_filters, per-request slos, and a post_process spec.

The eight built-in recipes

Recipe	Branch	What it builds
`max-throughput-ttft-sla`	`adaptive_search`	BO over concurrency, objective = throughput, SLA = TTFT
`max-throughput-itl-sla`	`adaptive_search`	BO over concurrency, objective = throughput, SLA = ITL
`max-concurrency-under-sla`	`adaptive_search` (`smooth_isotonic` default) or grid	1D feasibility — max concurrency where every SLA filter passes
`max-goodput-under-slo`	`adaptive_search`	BO maximizing goodput at >= attainment-fraction SLO compliance
`concurrency-ramp`	`sweep_parameters` + `degradation_knee_detect`	log-spaced concurrency grid, finds p99 degradation knee
`prefill-ttft-curve`	`sweep_parameters` + `ttft_curve_fit`	ISL grid at concurrency=1, linear / quadratic fit
`decode-itl-curve`	`sweep_parameters` + `itl_surface_fit`	2D (concurrency × OSL) grid, surface fit
`pareto-sweep`	`scenarios` + `pareto_sweep_export`	paired ISL/OSL × concurrency Pareto frontier

After expansion, downstream stages don’t know a recipe ever existed — they just see a normal AIPerfConfig.sweep with optional sla_filters attached.

End-to-end — putting it all together

One diagram from key-press to artifact:

Names worth remembering

If you remember nothing else from this doc, remember these eleven names — every other class in the sweep code is glue or helper.

Name	What it is	Where in the flow
`AIPerfConfig`	Typed envelope. Everything user-supplied lands here.	Stage 2 out -> 3 in
`BenchmarkConfig`	Benchmark body — models, endpoint, datasets, phases.	Field of `AIPerfConfig`
`SweepConfig`	Discriminated union — `GridSweep`, `ZipSweep`, `ScenarioSweep`, `AdaptiveSearchSweep`, `SobolSweep`, `LatinHypercubeSweep`.	Field of `AIPerfConfig`
`SearchRecipe`	Pluggable preset that emits a `SearchRecipeOutput`.	Pre-stage 3
`BenchmarkPlan`	Expanded plan — `configs[]`, `variations[]`, trials, sweep.	Stage 3 out -> 4 in
`MultiRunOrchestrator`	Drives the cell loop; dispatches grid vs adaptive.	Stage 4
`ExecutionStrategy`	Per-cell “should I keep going?” — `FixedTrialsStrategy` / `AdaptiveStrategy`.	Stages 5–6
`BenchmarkRun`	One `(cfg, variation, trial)` plus identity (`benchmark_id`, `sweep_id`, `label`, `cli_command`, `random_seed`). Smallest unit of work.	Stage 6 in
`RunExecutor`	ABC. Only impl: `LocalSubprocessExecutor`.	Stage 6
`RunResult`	One `BenchmarkRun`’s output (metrics + variation metadata).	Stage 6 out -> 7 in
`SweepAnalyzer`	Pure compute: `list[RunResult]` -> grouped / best / Pareto JSON.	Stage 7

For adaptive runs, three more:

Name	What it is
`SearchPlanner`	ABC. `BayesianSearchPlanner`, `MonotonicSLASearchPlanner`, `SmoothIsotonicSLAPlanner`, `OptunaSearchPlanner`.
`SearchIteration`	Per-iteration record — proposal + measured objective + feasibility.
`PostProcessHandler`	Recipe artifact emitter — `degradation_knee_detect`, `ttft_curve_fit`, `itl_surface_fit`, `sla_breach_knee`, `pareto_sweep_export`.

Part 3 — Class & module map

End-to-end view of how a YAML config or CLI invocation becomes an AIPerfConfig envelope, expands into a BenchmarkPlan, and is executed by MultiRunOrchestrator against a backend RunExecutor. Multiple zoom levels — pick whichever matches what you’re trying to understand.

The same BenchmarkPlan / MultiRunOrchestrator / RunExecutor machinery handles single-run, grid sweep, zip sweep, scenario sweep, and adaptive search. Dispatch differs only inside MultiRunOrchestrator.execute.

30,000 ft — what happens, period

10,000 ft — local end-to-end (with cluster path coming soon)

Sub-flow — config layer (YAML/CLI -> BenchmarkPlan)

Sub-flow — orchestrator iteration

cli_runner.run_benchmark peels off single-run plans (plan.is_single_run) before the orchestrator is constructed; only multi-run plans reach MultiRunOrchestrator.execute. Inside execute(), dispatch is two-way: adaptive-search vs. grid/scenarios. Grid/scenarios further branch on _plan_iteration_order(plan) which reads plan.sweep.iteration_order (REPEATED default, or INDEPENDENT).

(The artifact-tree layout table is documented above in Part 1 — Artifact directory layout reference.)

Sub-flow — RunExecutor backends

RunExecutor is a 2-method ABC: execute(run) -> RunResult and derive_id(plan, var_idx, trial) -> str. The local executor derives a stable id from the plan/variation/trial tuple for artifact naming; the cluster executor (coming soon — finalized on the K8s integration branch, not yet on main) derives a deterministic K8s-name-safe id from (plan, var_idx, trial) so child AIPerfJob creation is idempotent.

The RunResult shape returned by both backends is identical — the cluster path fetches the same profile_export_aiperf.json schema over HTTP that the local path reads off disk. Downstream SweepAnalyzer.compute(), aggregate_and_export(), and the search_history.json writer don’t know which backend produced the inputs.

Class / module map

Sequence — a sweep run end to end

Where to look in the code

Concept	File
`AIPerfConfig` envelope, `BenchmarkConfig` body	`src/aiperf/config/config.py`
`BenchmarkPlan`, `BenchmarkRun`, `ResolvedConfig`	`src/aiperf/config/resolution/plan.py`
`MultiRunConfig`, `ConvergenceConfig`	`src/aiperf/config/sweep/multi_run.py`
`SweepConfig` union / `GridSweep` / `ZipSweep` / `ScenarioSweep` / `AdaptiveSearchSweep` / `Objective` / `OutcomeConstraint` / `SweepVariation`	`src/aiperf/config/sweep/config.py`
`SobolSweep` / `LatinHypercubeSweep` / QMC sampling helpers	`src/aiperf/config/sweep/sampling.py`, `src/aiperf/config/sweep/expand_qmc.py`
`expand_sweep` (definition)	`src/aiperf/config/sweep/expand.py` (re-exported from `src/aiperf/config/sweep/__init__.py`)
`SearchSpaceDimension`, `SLAFilter`	`src/aiperf/config/sweep/adaptive.py`
`PostProcessSpec`, `SearchRecipe`, `SearchRecipeContext`, `SearchRecipeOutput`	`src/aiperf/search_recipes/_base.py` (`PostProcessSpec` defined in `src/aiperf/search_recipes/_post_process.py`, re-exported from `_base.py`)
`PostProcessHandler` Protocol + built-ins	`src/aiperf/search_recipes/post_process.py`
`build_benchmark_plan` (load -> plan)	`src/aiperf/config/loader/plan.py`
`MultiRunOrchestrator`	`src/aiperf/orchestrator/orchestrator.py`
`RunExecutor` ABC + `RunResult`	`src/aiperf/orchestrator/executor.py`, `src/aiperf/orchestrator/models.py`
`LocalSubprocessExecutor`	`src/aiperf/orchestrator/local_executor.py`
Subprocess runner entry (`python -m`)	`src/aiperf/orchestrator/subprocess_runner.py`
`SearchPlanner` ABC + `SearchIteration`	`src/aiperf/orchestrator/search_planner/base.py`
`BayesianSearchPlanner` / `MonotonicSLASearchPlanner` / `SmoothIsotonicSLAPlanner` / `OptunaSearchPlanner`	`src/aiperf/orchestrator/search_planner/{bayesian,monotonic,smooth_isotonic,optuna_planner}.py`
`parse_sla_filter`, `parse_search_space`	`src/aiperf/orchestrator/search_planner/parsing.py`
`SweepAnalyzer` + exporters	`src/aiperf/orchestrator/aggregation/sweep.py`
`aggregate_sweep_and_export` (file writer)	`src/aiperf/cli_runner/_sweep_aggregate.py` (re-exported from `cli_runner/_aggregate.py`)
`write_search_history`	`src/aiperf/exporters/search_history.py`
`run_benchmark` (single vs multi dispatch) + `_reject_in_process_sweep_under_operator`	`src/aiperf/cli_runner.py`
Plugin registry + categories	`src/aiperf/plugin/{plugins.py,categories.yaml,types.py,schema/}`

ABC hierarchy — orchestrator-side

The orchestrator layer’s extension points are abstract base classes; implementations are registered as plugins or instantiated directly by category-aware factories.

Sweep execution flow — class module map in motion

How the types from the class diagram actually flow through a sweep run. Read it as: each box is an instance of a class from the class diagram; arrows show what produces what; cardinality annotations make the fan-out explicit (1 plan -> N variations × M trials -> N×M results -> 1 aggregate).

The two views together: the flowchart shows cardinality and which class produces which (the data shape of a sweep); the sequence shows the temporal call pattern between the same classes. Both use only the types from the class diagram — no module-internal helpers.

Adaptive search — class types

The adaptive search path layers atop the same BenchmarkPlan / MultiRunOrchestrator / RunExecutor core. Adaptive config is not a separate field — it’s the AdaptiveSearchSweep variant of the SweepConfig discriminated union (type: adaptive_search). Two plugin categories cooperate: a search_planner (drives the outer loop) and an optional search_recipe (curates the search space / objective / post-process from a higher-level recipe template). The optional terminal post_process is a single PostProcessSpec resolved via search_recipe_post_process plugins.

Built-in search_recipe plugins (src/aiperf/search_recipes/):

max-throughput-ttft-sla, max-throughput-itl-sla
concurrency-ramp
prefill-ttft-curve, decode-itl-curve
max-goodput-under-slo, max-concurrency-under-sla
pareto-sweep

Recipes choose one of three output branches: adaptive_search (BO-style), sweep_parameters (grid-style — e.g. concurrency-ramp, prefill-ttft-curve, decode-itl-curve), or scenarios (deep-merge variants — e.g. pareto-sweep). The SearchRecipeOutput validator enforces exactly-one-of, so downstream code can branch cleanly.

Built-in search_recipe_post_process plugins: degradation_knee_detect, ttft_curve_fit, itl_surface_fit, sla_breach_knee, pareto_sweep_export.

Adaptive search — execution flow

The BO outer loop is a propose -> execute -> record cycle inside MultiRunOrchestrator.execute_adaptive_search. BenchmarkRun and RunExecutor are the same as in the grid path; the difference is that BenchmarkPlan.configs starts with one seed config and grows by one per iteration as the planner asks for the next point.

Adaptive search — recipe -> AdaptiveSearchSweep

A user can either author an AdaptiveSearchSweep directly under sweep: (low level) or pick a search_recipe plugin (high level) that builds one from a recipe + the user’s existing benchmark config. The adaptive block lives entirely on sweep; there is no separate adaptive-search field on MultiRunConfig.

SearchPlanner — protocols, planners, and extension points

How AIPerf’s Bayesian-Optimization outer loop is wired together: the protocols, the runtime sequence, and the config-to-execution flow.

The planner and the orchestrator talk through narrow protocols. MultiRunOrchestrator doesn’t know about Bayesian Optimization — it only knows SearchPlanner and RunExecutor. The Optuna+BoTorch dependency is hidden inside OptunaSearchPlanner and its BayesianSearchPlanner curated-preset subclass; MonotonicSLASearchPlanner and SmoothIsotonicSLAPlanner are 1D-feasibility-search planners that plug in at the same SearchPlanner ABC. Future planners (random-search baseline, MORBO, etc.) plug in identically.

Registered planners

Planner modules below are relative to aiperf.orchestrator.search_planner. (e.g. bayesian.py -> aiperf.orchestrator.search_planner.bayesian).

Plugin name	Class	Module	Purpose
`bayesian`	`BayesianSearchPlanner`	`bayesian.py`	Curated Optuna preset (subclass of `OptunaSearchPlanner`); uses BoTorch qLogNEI/qLogNEHVI when available and falls back to TPE with a warning when the optional BoTorch stack is unavailable
`monotonic_sla`	`MonotonicSLASearchPlanner`	`monotonic.py`	1D exponential probe + bisection mirroring perf_analyzer’s `--binary-search`. Margin-magnitude-blind.
`smooth_isotonic`	`SmoothIsotonicSLAPlanner`	`smooth_isotonic.py` (+ helpers `_smooth_isotonic_fit.py`, `_replicate_budget.py`, `_cliff_detect.py`, `_margin_normalize.py`)	1D PAVA + PCHIP smooth-isotonic fit; opt-in replicates and bootstrap CI; cliff-curve guard. Default for `max-concurrency-under-sla`.
`optuna`	`OptunaSearchPlanner`	`optuna_planner.py`	Expert-mode Optuna BO (TPE / GP / BoTorch samplers exposed via `--optuna-sampler`); Optuna ships by default, BoTorch requires the optional `botorch` extra.

All four are registered in src/aiperf/plugin/plugins.yaml under the search_planner: category and resolved via plugins.get_class(PluginType.SEARCH_PLANNER, name).

SearchPlanner class diagram

The CLI grammar lives in aiperf.orchestrator.search_planner.parsing.parse_search_space(values), which converts --search-space "path:lo,hi[:kind]" strings into SearchSpaceDimension instances. The v1->v2 converter (build_multi_run in aiperf.config.flags._converter_optionals) packages everything into a typed AdaptiveSearchSweep carried on AIPerfConfig.sweep.

Runtime sequence — one BO iteration

MultiRunOrchestrator.execute_adaptive_search is a thin loop. Every iteration: ask the planner for a (BenchmarkConfig, SweepVariation); run all configured trials at that point via the same _run_independent_cell grid sweeps use; tell the planner what happened; write search_history.json incrementally. When ask() returns None, surface the planner’s convergence_reason() and exit.

A few things this view makes explicit:

Aggregate observations to the GP. The planner currently reports one Optuna trial per search point. With objective_pooling=mean it tells Optuna the mean of finite per-trial objective values; with pooled percentile mode it tells Optuna the pooled percentile objective computed from the raw record samples. Per-trial RunResult objects remain on SearchIteration.results in memory for search-history derivation, but separate per-trial Optuna observations are not recorded in v1.
Failed-iteration handling. When zero trials produce a usable objective, the planner still calls study.tell(trial, fallback_objective) so Optuna’s ask/tell pairing stays consistent. The fallback is a strictly worse-than-prior sentinel used only inside the study; search_history.json persists objective_values: null for that iteration.
Three convergence signals. is_converged() checks max_iterations, then improvement-over-best patience, then coefficient-of-variation plateau. The first to fire wins; the reason is recorded in search_history.json.

Config flow — CLI / YAML -> execution

The CLI feeds a CLIConfig through src/aiperf/config/flags/converter.py, which packages the search-space + objective + filters into a typed AdaptiveSearchSweep carried on AIPerfConfig.sweep. From there the plan builder produces a BenchmarkPlan, and MultiRunOrchestrator.execute dispatches on plan.is_adaptive_search to the BO loop.

Notes on extension points

Adding a new planner backend (random-search baseline, etc.): subclass SearchPlanner, implement the four abstract methods (ask/tell/is_converged/history); optionally override convergence_reason (default returns None) and boundary_summary (1D feasibility planners only). No orchestrator changes required — MonotonicSLASearchPlanner, SmoothIsotonicSLAPlanner, and OptunaSearchPlanner are existing examples reusing the same ABC. Wiring is already generic: AdaptiveSearchSweep.planner is a SearchPlannerType (ExtensibleStrEnum in src/aiperf/plugin/enums.pyi), and cli_runner._run_multi_benchmark instantiates the planner via plugins.get_class(PluginType.SEARCH_PLANNER, sweep.planner). To register a new backend, add an entry under search_planner: in src/aiperf/plugin/plugins.yaml and a matching enum value — no dispatch code changes.
Adding a new executor backend: subclass RunExecutor. LocalSubprocessExecutor iterates one (variation, trial) at a time via execute(BenchmarkRun) — the seam is adaptive-shaped by construction.
Replacing the BO backend. The OptunaSearchPlanner boundary (and its BayesianSearchPlanner curated-preset subclass) is the only Optuna-aware code in the project. BoTorch-specific acquisitions live behind the optional botorch extra in pyproject.toml. The qlognei / qlognehvi acquisitions, posterior-regret stopping (--optuna-terminator regret/emmr), pooled-percentile aggregation (--search-percentile-pooling pooled), and the Hvarfner-DSP Matern-5/2 kernel (arXiv:2402.02229) are all plumbed through _optuna_helpers.py. The remaining principled upgrade path is wiring per-iteration heteroscedastic noise estimates from the pooled-percentile JSONL helper into a HeteroskedasticSingleTaskGP-based custom candidates_func. Evidence-gated: ship only if observed within-trial variance varies meaningfully across the search space on real workloads.

`smooth_isotonic` as novel-in-composition

The SmoothIsotonicSLAPlanner algorithm (PAVA monotonic regression as denoiser -> PCHIP cubic Hermite interpolant -> root-find for SLA-threshold crossing, plus a PAVA-residual changepoint detector for the cliff-guard exit) does not appear in published BO literature in this exact composition. The components are textbook (PAVA: pool-adjacent-violators; PCHIP: shape-preserving piecewise-cubic Hermite interpolation; bracketed root-find: classical numerical analysis), but their composition for SLA-saturation in noisy GPU-serving benchmarking is original. Adjacent prior art:

Letham et al. 2017 (arXiv:1706.07094) — the noise-modeling anchor; per-trial-observations in BO with feasibility-product constraints.
DistServe (Zhong et al. OSDI ‘24, arXiv:2401.09670) — “DistServe simply enumerates the placements via binary search and finds the maximum rate that meets the SLO attainment target with simulation trials.” MonotonicSLASearchPlanner reproduces DistServe’s algorithm; SmoothIsotonicSLAPlanner is a strict improvement (denoised + continuous-space root-find).
BOute (Jiang et al. 2026, arXiv:2602.10729) — closest contemporary work using BO for LLM serving; constrained qNEHVI on BoTorch with ModelListGP. Different problem (serving-system optimization rather than benchmark-side adaptive sweep), same machinery family.

smooth_isotonic is defensibly novel in the systems-benchmarking literature even though every individual piece is classical statistics; worth a section in a future technical report.