This page covers configuration and runtime errors for both grid-style parameter sweeps and adaptive (Bayesian) search. For algorithm semantics, see Bayesian-Optimization Outer Loop. For the YAML reference, see Parameter Sweeps.
Each entry quotes the literal error/warning string raised by the code today, with a source-file pointer so you can verify against main.
Error Message (Pydantic, from CLI parse):
Cause: You provided a non-numeric value for --concurrency. parse_int_or_int_list calls int(s) directly, so the stdlib ValueError propagates and Pydantic wraps it as the int_parsing error above on the concurrency field.
Where it’s raised: src/aiperf/config/loader/parsing.py (parser), src/aiperf/config/flags/cli_config.py (field).
Solution:
Error Message (stdlib, surfaced through Pydantic):
Cause: One element of a comma-separated --concurrency list is not a valid integer. The list parser does [int(p) for p in parts] and the stdlib ValueError is raised on the first bad token, with no list context or position information.
Where it’s raised: src/aiperf/config/loader/parsing.py.
Solution:
Error Message (Pydantic):
Cause: A concurrency value is zero or negative. PhaseConfig.concurrency is constrained to ge=1, so each value is rejected individually with the standard Pydantic greater_than_equal error — there is no aggregated, position-aware message.
Where it’s raised: src/aiperf/config/phases.py.
Solution:
Why: Concurrency represents the number of in-flight requests. Zero or negative is meaningless.
Error Message (late-stage, plan validation — covers both sweep and multi-run):
Where it’s raised: src/aiperf/cli_runner.py (_validate_multi_benchmark_plan).
Earlier sweep-only message (fires first when --ui dashboard is explicitly set on a sweep config):
Where it’s raised: src/aiperf/config/config.py (validate_sweep_no_dashboard_ui, model-validator). Only triggers when runtime.ui is explicitly set by the user and a sweep is configured; multi-run alone does not trip this early check.
Cause: The dashboard UI requires exclusive terminal control and would overwrite itself between sequential runs.
Solution:
CLI path (Pydantic, fires first):
--parameter-sweep-cooldown-seconds has Field(ge=0), so any negative value is rejected at config-parse time before the strategy ever sees it.
Where it’s raised: src/aiperf/config/flags/cli_config.py.
Programmatic path (FixedTrialsStrategy direct construction):
Where it’s raised: src/aiperf/orchestrator/strategies.py.
Solution:
Error Message (grid sweep):
Error Message (zip sweep):
Cause: A sweep block (in a YAML config) declared a parameter with an empty values: list. This applies to YAML-defined sweeps only; the magic-list CLI path (e.g. --concurrency 10,20,30) collapses --concurrency "" to None and never enters this sweep-block code, so there is no CLI-side trigger for these messages.
Where it’s raised: src/aiperf/config/sweep/expand.py (grid), src/aiperf/config/sweep/expand.py (zip).
Warning Message (sweep mode, per-variation):
Where it’s raised: src/aiperf/cli_runner/_sweep_aggregate.py.
Note: Sweep mode does not require at least 2 successful runs. ConfidenceAggregation has a documented single-run degraded mode (std=0, CI collapsed to mean, single_run: True in metadata), and per-variation aggregation explicitly lets single-success cells through — see the comment at src/aiperf/cli_runner/_sweep_aggregate.py. Only cells with zero successful runs are skipped.
Related sweep-level warnings:
Skipping per-variation aggregate for '<label>': ConfidenceAggregation raised <exc> — aggregation crashed for that cell (cli_runner/_sweep_aggregate.py).Sweep aggregate skipped: no successful runs across all variations. — the whole-sweep summary is skipped only when every variation had zero successes (cli_runner/_sweep_aggregate.py).Warning Message (non-sweep multi-run path):
Where it’s raised: src/aiperf/cli_runner.py. This message applies to plain --num-profile-runs runs (no sweep), where the “need at least 2” rule does hold.
Solution:
Some flag combinations that look incorrect do not currently raise. Listing them here so users searching for an error message don’t waste time looking:
--parameter-sweep-mode, --parameter-sweep-cooldown-seconds, and --parameter-sweep-same-seed are silently no-ops when no sweep is configured. The sweep-override pathway in src/aiperf/config/flags/converter.py only consults these fields when a sweep block is present. No validator exists today.--confidence-level, --profile-run-cooldown-seconds, and --profile-run-disable-warmup-after-first are silently ignored when --num-profile-runs is 1. The CLI help text for --confidence-level says “Only applies when —num-profile-runs > 1” but this is informational, not enforced (src/aiperf/config/flags/cli_config.py). --set-consistent-seed also applies in sweep-without-multi-run mode (src/aiperf/config/config.py), so it is not strictly multi-run-only.If you hit one of these and were expecting an error, please file an issue — these are good UX targets for future validators.
This section resolves errors and warnings from AIPerf’s adaptive-search feature — aiperf profile --search-space ... --search-metric ... --search-direction ... --search-max-iterations .... AIPerf wraps Optuna+BoTorch to drive a Bayesian-Optimization (BO) outer loop; most errors come from input validation and a small set of mutual-exclusion guards.
For the deeper “why does BO behave this way,” see /aiperf/dev/sweeping-adaptive-search/bayesian-optimization.
Error message:
Cause:
OptunaSearchPlanner uses Optuna core by default, but its implicit preferred sampler is BoTorch. Explicit --optuna-sampler botorch or BoTorch-only acquisitions require optuna-integration, botorch>=0.10, gpytorch, and torch. When BoTorch is only the implicit default, AIPerf falls back to TPE with a warning if this optional stack is unavailable; explicit BoTorch requests fail instead of silently changing semantics.
Fix:
--search-space StringError message:
Other shapes from the same parser:
Cause:
parse_search_space in src/aiperf/orchestrator/search_planner/parsing.py implements the grammar PATH:LO,HI[:KIND] with KIND in {int, real} (default real). Common bugs: missing the : separator, swapping HI/LO, non-numeric bound, or a kind outside int|real.
Fix:
--search-space is repeatable; pass it once per dimension.
Error message:
Cause:
The dotted path is resolved by _set_nested_value in src/aiperf/config/sweep/expand.py against the dict form of BenchmarkConfig. Named-list segments (e.g. phases.profiling.*) match on the entry’s name field. Typos like phase.profiling.concurrency (no s) or phases.profilling.concurrency (extra l) error loudly rather than silently creating a phantom phase.
Fix:
Common top-level segments: phases.<name>.<field> (typically profiling or warmup; <field> is a BasePhaseConfig scalar like concurrency, request_rate, request_count), endpoint.<field>, runtime.<field>.
--search-metric Uses an Aggregator-Suffixed KeyCause:
The BO objective is the bare metric tag (e.g. output_token_throughput, time_to_first_token) — not the flattened _avg / _p99 form that appears in CSV/JSON exports. The statistic is selected separately via --search-stat (one of avg, p50, p90, p95, p99; default avg). See _extract_objective_vector in src/aiperf/orchestrator/search_planner/optuna_planner.py and AdaptiveSearchSweep.objectives[0].metric in src/aiperf/config/sweep/config.py.
Fix:
See “Objective Semantics” in /aiperf/dev/sweeping-adaptive-search/bayesian-optimization for which metric tags are produced and how stats map to JSON fields.
--search-metric Names a Metric the Run Doesn’t ProduceWarning message:
Cause:
_extract_objective_vector in src/aiperf/orchestrator/search_planner/optuna_planner.py keeps trials only if r.summary_metrics[self._cfg.objectives[0].metric] is present. If the metric never appears (e.g. time_to_first_token against a non-streaming endpoint, or inter_token_latency for a single-token completion), every trial is filtered out, the iteration produces no usable objective, and the planner feeds Optuna a per-objective sentinel vector — see entry 6 for the mechanics.
Fix:
Confirm the metric is produced before driving a long BO run:
If the desired metric is missing, pick one that is produced or adjust the run to produce it (e.g. enable streaming for time-to-first-token).
Warning message:
Same as entry 5. The corresponding entry in search_history.json has objective_values: null.
Cause:
When every trial fails, the planner builds a per-objective sentinel via _failure_sentinel_vector (see src/aiperf/orchestrator/search_planner/optuna_planner.py) and feeds it to study.tell(trial, ...) so the ask/tell pairing stays consistent. Each sentinel is the worst-of-prior value for that objective plus a 10%-or-1.0 margin in the worse direction; if no prior history exists for that objective, it falls back to +/- NO_DATA_SENTINEL_LOSS. The sentinel value IS observed by Optuna’s surrogate (the GP sees a strictly-worse-than-anything-seen point so it deprioritizes that region), but the fallback value is NOT persisted to search_history.json — objective_values is set to null for that iteration, matching what /aiperf/dev/api/search-history-api-reference describes.
This keeps the ask/tell loop consistent and lets the loop continue rather than aborting.
Fix:
The fallback is a degraded mode, not a clean signal — investigate the failures rather than letting them accumulate:
Common causes: server timeouts, OOM at high concurrency, endpoint refusing streaming, metric-collection error. Tighten server availability or narrow the search-space bounds before re-running. See /aiperf/dev/api/search-history-api-reference for the search_history.json schema and how to filter sentinel iterations.
--search-* + Magic-List FlagError message:
Cause:
Magic-list flags (--concurrency 10,20,30) are promoted to a top-level sweep: block by _promote_magic_lists_to_sweep_block in src/aiperf/config/flags/converter.py. The converter’s Pydantic validation of AdaptiveSearchSweep (declared with extra="forbid" in src/aiperf/config/sweep/config.py) then rejects the combination — BO chooses iterations adaptively from continuous ranges, while a magic-list expects you to enumerate the discrete points up front.
Fix:
See the “grid vs BO” decision matrix in /aiperf/dev/sweeping-adaptive-search/bayesian-optimization.
--search-* + Explicit sweep: YAML BlockError message:
Cause:
Same guard as entry 7: AdaptiveSearchSweep’s extra="forbid" validator in src/aiperf/config/sweep/config.py rejects the merged dict. Triggered when an aiperf-config.yaml contains a top-level sweep: block AND the CLI invocation passes --search-* flags.
Fix:
Drop one or the other. If your config carries a leftover sweep: block from an earlier experiment, remove it before adding --search-*:
--search-* + --convergence-metricError message:
Raised as TypeError from _reject_search_plus_convergence in src/aiperf/config/flags/_converter_optionals.py when both --search-space (with its companion --search-* flags) and --convergence-metric are set on the same aiperf profile invocation.
Cause:
--convergence-metric is a trial-level adaptive stop (stop trials at a single benchmark point once the metric stabilizes); --search-* is an outer-loop adaptive search (choose the next benchmark point). The two are conceptually orthogonal but their composition is not yet well-defined: which value to report to the planner under early-stop, and whether to count convergence-stopped trials toward the per-iteration trial budget, both need explicit semantics.
Fix:
Pick one until composition is supported:
--search-initial-points >= --search-max-iterationsError message:
Cause:
AdaptiveSearchSweep._check_initial_points_below_max_iterations in src/aiperf/config/sweep/config.py rejects the configuration. BO needs at least one iteration after the random Sobol-seeded initial points so the GP can fit and the sampler can propose informed points. Default for --search-initial-points is 5; --search-max-iterations has no default and is required whenever --search-space is set.
Fix:
Why this rule exists:
The Sobol-random phase exists to seed the GP with diverse points before it can fit a meaningful posterior. If the entire iteration budget is consumed by the random phase, the run is just expensive uniform sampling — there’s no BO-shaped value left to extract. The strict < ensures at least one GP-driven iteration runs.
If you encounter an error not covered in this guide:
Check the error message carefully - Pydantic errors include the field path, the constraint that failed, and the offending input value.
Review the documentation:
Report a bug if:
Include in your bug report:
aiperf --version)search_history.json schema and how to inspect per-iteration objective values.