Sweep Aggregate API Reference

View as Markdown

Complete API documentation for parameter sweep aggregate outputs, including JSON schema, CSV format, and programmatic analysis examples.

Overview

When running parameter sweeps with AIPerf (e.g., --concurrency 10,20,30), the system generates sweep aggregate files that summarize performance across all parameter combinations. These aggregates enable:

  • Comparison of performance across parameter combinations
  • Identification of optimal configurations
  • Pareto frontier analysis for multi-objective optimization
  • Statistical analysis with confidence intervals (when using --num-profile-runs > 1)

Output Files

Sweep aggregates are written to different locations depending on the sweep mode:

Independent Mode (sweep-only, no --num-profile-runs):

artifacts/
{benchmark_name}/
sweep_aggregate/
profile_export_aiperf_sweep.json # Structured data for programmatic analysis
profile_export_aiperf_sweep.csv # Tabular format for spreadsheet analysis

Repeated Mode (sweep with --num-profile-runs > 1):

artifacts/
{benchmark_name}/
aggregate/
concurrency_10/ # Per-value confidence aggregates
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_20/
...
sweep_aggregate/ # Cross-value sweep analysis
profile_export_aiperf_sweep.json
profile_export_aiperf_sweep.csv

The sweep aggregate files contain cross-value analysis including best configurations and Pareto optimal points.


JSON Schema

Top-Level Structure

1{
2 "aggregation_type": "sweep",
3 "num_profile_runs": 15,
4 "num_successful_runs": 15,
5 "failed_runs": [],
6 "metadata": { ... },
7 "per_combination_metrics": [ ... ],
8 "best_configurations": { ... },
9 "pareto_optimal": [ ... ]
10}

Top-Level Fields:

FieldTypeDescription
aggregation_typestringAlways "sweep" for sweep aggregates
num_profile_runsintTotal number of profile runs executed
num_successful_runsintNumber of successful profile runs
failed_runsarrayList of failed runs with error details (empty if all succeeded)
metadataobjectSweep configuration and execution metadata
per_combination_metricsarrayList of metrics for each parameter combination
best_configurationsobjectBest parameter combinations for key metrics
pareto_optimalarrayList of Pareto optimal parameter combinations

Metadata Section

Contains information about the sweep configuration.

1{
2 "metadata": {
3 "sweep_parameters": [
4 {
5 "name": "concurrency",
6 "values": [10, 20, 30, 40]
7 }
8 ],
9 "num_combinations": 4
10 }
11}

Fields:

FieldTypeDescription
sweep_parametersarrayList of parameter definitions (name and values)
num_combinationsintTotal number of parameter combinations tested

Sweep Parameters Structure:

Each parameter definition contains:

  • name: Parameter name (e.g., "concurrency", "request_rate")
  • values: List of values tested for this parameter

Per-Combination Metrics Section

Contains aggregated metrics for each parameter combination. This is a list where each entry represents one combination.

1{
2 "per_combination_metrics": [
3 {
4 "parameters": {
5 "concurrency": 10
6 },
7 "metrics": {
8 "request_throughput_avg": {
9 "mean": 100.5,
10 "std": 5.2,
11 "min": 95.0,
12 "max": 108.0,
13 "cv": 0.052,
14 "se": 2.3,
15 "ci_low": 94.3,
16 "ci_high": 106.7,
17 "t_critical": 2.776,
18 "unit": "requests/sec"
19 },
20 "ttft_p99_ms": {
21 "mean": 120.5,
22 "std": 8.1,
23 "min": 110.2,
24 "max": 132.8,
25 "cv": 0.067,
26 "se": 3.6,
27 "ci_low": 111.5,
28 "ci_high": 129.5,
29 "t_critical": 2.776,
30 "unit": "ms"
31 }
32 }
33 },
34 {
35 "parameters": {
36 "concurrency": 20
37 },
38 "metrics": { ... }
39 }
40 ]
41}

Combination Entry Fields:

FieldTypeDescription
parametersobjectDictionary of parameter names to values for this combination
metricsobjectDictionary of metric names to statistics

Metric Statistics Fields:

FieldTypeDescription
meanfloatMean value across trials
stdfloatStandard deviation across trials
minfloatMinimum value observed
maxfloatMaximum value observed
cvfloatCoefficient of variation (std/mean)
sefloatStandard error of the mean
ci_lowfloatLower bound of confidence interval
ci_highfloatUpper bound of confidence interval
t_criticalfloatCritical t-value used for confidence interval
unitstringUnit of measurement

Note: For single-trial sweeps (--num-profile-runs 1), only mean and unit fields are present.

Best Configurations Section

Identifies the parameter combinations that achieved the best performance for key metrics.

1{
2 "best_configurations": {
3 "best_throughput": {
4 "parameters": {
5 "concurrency": 40
6 },
7 "metric": 350.2,
8 "unit": "requests/sec"
9 },
10 "best_latency_p99": {
11 "parameters": {
12 "concurrency": 10
13 },
14 "metric": 120.5,
15 "unit": "ms"
16 }
17 }
18}

Configuration Fields:

FieldTypeDescription
parametersobjectParameter combination that achieved best performance
metricfloatThe metric value achieved
unitstringUnit of measurement

Available Configurations:

  • best_throughput: Highest request_throughput_avg
  • best_latency_p99: Lowest ttft_p99_ms (or request_latency_p99 as fallback)

Pareto Optimal Section

Lists parameter combinations that are Pareto optimal - configurations where no other configuration is strictly better on all objectives simultaneously.

1{
2 "pareto_optimal": [
3 {"concurrency": 10},
4 {"concurrency": 30},
5 {"concurrency": 40}
6 ]
7}

Default Objectives:

  • Maximize: request_throughput_avg (throughput)
  • Minimize: ttft_p99_ms (latency)

A configuration is Pareto optimal if:

  • No other configuration has both higher throughput AND lower latency
  • It represents a valid trade-off point on the efficiency frontier

Example Interpretation:

Concurrency 10: Low latency, moderate throughput (latency-optimized)
Concurrency 30: Balanced latency and throughput
Concurrency 40: High throughput, higher latency (throughput-optimized)

Multi-Parameter Sweeps:

For sweeps with multiple parameters (e.g., --concurrency 10,20 --request-rate 5,10), each Pareto optimal entry contains all parameter values:

1{
2 "pareto_optimal": [
3 {"concurrency": 10, "request_rate": 5},
4 {"concurrency": 20, "request_rate": 10}
5 ]
6}

CSV Format

The CSV export provides a tabular view optimized for spreadsheet analysis and plotting.

Structure

The CSV file contains multiple sections separated by blank lines:

  1. Per-Combination Metrics Table (main data)
  2. Best Configurations
  3. Pareto Optimal Points
  4. Metadata

Per-Combination Metrics Table

The first section is a wide-format table with one row per parameter combination:

1concurrency,request_throughput_avg_mean,request_throughput_avg_std,request_throughput_avg_min,request_throughput_avg_max,request_throughput_avg_cv,ttft_p99_ms_mean,ttft_p99_ms_std,ttft_p99_ms_min,ttft_p99_ms_max,ttft_p99_ms_cv
210,100.50,5.20,95.00,108.00,0.0520,120.50,8.10,110.20,132.80,0.0672
320,180.30,8.50,170.00,195.00,0.0471,135.20,9.30,125.00,148.00,0.0688
430,270.80,12.10,255.00,290.00,0.0447,155.80,11.20,142.00,172.00,0.0719
540,285.50,15.30,265.00,310.00,0.0536,180.30,13.50,165.00,200.00,0.0749

Columns:

  • Parameter columns (e.g., concurrency, request_rate)
  • For each metric: {metric}_mean, {metric}_std, {metric}_min, {metric}_max, {metric}_cv

Multi-Parameter Example:

1concurrency,request_rate,request_throughput_avg_mean,request_throughput_avg_std,...
210,5,50.25,2.10,...
310,10,95.30,4.50,...
420,5,98.40,3.20,...
520,10,185.60,7.80,...

Best Configurations Section

1Best Configurations
2Configuration,concurrency,Metric,Unit
3Best Throughput,40,285.50,requests/sec
4Best Latency P99,10,120.50,ms

For multi-parameter sweeps:

1Best Configurations
2Configuration,concurrency,request_rate,Metric,Unit
3Best Throughput,40,10,350.20,requests/sec
4Best Latency P99,10,5,95.30,ms

Pareto Optimal Section

1Pareto Optimal Points
2concurrency
310
430
540

For multi-parameter sweeps:

1Pareto Optimal Points
2concurrency,request_rate
310,5
420,10
540,10

Metadata Section

1Metadata
2Field,Value
3Aggregation Type,sweep
4Sweep Parameters,concurrency
5Number of Combinations,4
6Number of Profile Runs,12
7Number of Successful Runs,12

Artifact Directory Structure

Repeated Mode (--parameter-sweep-mode repeated)

Default mode where the full sweep is executed N times:

artifacts/
{benchmark_name}/
profile_runs/
trial_0001/
concurrency_10/
profile_export_aiperf.json
profile_export.jsonl
concurrency_20/
profile_export_aiperf.json
profile_export.jsonl
concurrency_30/
profile_export_aiperf.json
profile_export.jsonl
trial_0002/
concurrency_10/
concurrency_20/
concurrency_30/
trial_0003/
concurrency_10/
concurrency_20/
concurrency_30/
aggregate/
concurrency_10/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_20/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_30/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
sweep_aggregate/
profile_export_aiperf_sweep.json
profile_export_aiperf_sweep.csv

Execution Pattern:

Trial 1: [10 → 20 → 30]
Trial 2: [10 → 20 → 30]
Trial 3: [10 → 20 → 30]

Independent Mode (--parameter-sweep-mode independent)

All trials at each parameter value before moving to the next:

artifacts/
{benchmark_name}/
concurrency_10/
profile_runs/
trial_0001/
profile_export_aiperf.json
profile_export.jsonl
trial_0002/
trial_0003/
aggregate/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_20/
profile_runs/
trial_0001/
trial_0002/
trial_0003/
aggregate/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_30/
profile_runs/
trial_0001/
trial_0002/
trial_0003/
aggregate/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
sweep_aggregate/
profile_export_aiperf_sweep.json
profile_export_aiperf_sweep.csv

Execution Pattern:

Concurrency 10: [trial1, trial2, trial3]
Concurrency 20: [trial1, trial2, trial3]
Concurrency 30: [trial1, trial2, trial3]

Single-Trial Sweep

When --num-profile-runs 1 (or omitted), no trial directories are created:

artifacts/
{benchmark_name}/
concurrency_10/
profile_export_aiperf.json
profile_export.jsonl
concurrency_20/
profile_export_aiperf.json
profile_export.jsonl
concurrency_30/
profile_export_aiperf.json
profile_export.jsonl
sweep_aggregate/
profile_export_aiperf_sweep.json
profile_export_aiperf_sweep.csv

Programmatic Analysis Examples

Example 1: Load and Inspect Sweep Results

1import json
2from pathlib import Path
3
4# Load sweep aggregate
5sweep_file = Path("artifacts/my_benchmark/sweep_aggregate/profile_export_aiperf_sweep.json")
6with open(sweep_file) as f:
7 sweep_data = json.load(f)
8
9# Inspect metadata
10metadata = sweep_data["metadata"]
11sweep_params = metadata["sweep_parameters"]
12print(f"Sweep parameters: {[p['name'] for p in sweep_params]}")
13print(f"Total combinations: {metadata['num_combinations']}")
14print(f"Total runs: {sweep_data['num_profile_runs']}")

Example 2: Find Optimal Configuration

1# Get best configurations
2best_configs = sweep_data["best_configurations"]
3
4best_throughput = best_configs["best_throughput"]
5print(f"Best throughput: {best_throughput['metric']:.2f} {best_throughput['unit']}")
6print(f" Parameters: {best_throughput['parameters']}")
7
8best_latency = best_configs["best_latency_p99"]
9print(f"Best latency: {best_latency['metric']:.2f} {best_latency['unit']}")
10print(f" Parameters: {best_latency['parameters']}")

Example 3: Analyze Pareto Frontier

1# Get Pareto optimal points
2pareto_optimal = sweep_data["pareto_optimal"]
3print(f"Found {len(pareto_optimal)} Pareto optimal configurations")
4
5# Extract metrics for Pareto points
6per_combination_metrics = sweep_data["per_combination_metrics"]
7
8print("\nPareto Frontier:")
9for combo in per_combination_metrics:
10 params = combo["parameters"]
11 # Check if this combination is Pareto optimal
12 if params in pareto_optimal:
13 metrics = combo["metrics"]
14 throughput = metrics["request_throughput_avg"]["mean"]
15 latency = metrics["ttft_p99_ms"]["mean"]
16 print(f" {params}: {throughput:.1f} req/s, {latency:.1f} ms p99")

Example 4: Compare Confidence Intervals

1import matplotlib.pyplot as plt
2
3# Extract data for single-parameter sweep
4combinations = sweep_data["per_combination_metrics"]
5
6# Assuming single parameter (concurrency)
7param_name = sweep_data["metadata"]["sweep_parameters"][0]["name"]
8param_values = []
9throughputs = []
10ci_lows = []
11ci_highs = []
12
13for combo in combinations:
14 param_value = combo["parameters"][param_name]
15 tp = combo["metrics"]["request_throughput_avg"]
16
17 param_values.append(param_value)
18 throughputs.append(tp["mean"])
19 ci_lows.append(tp.get("ci_low", tp["mean"]))
20 ci_highs.append(tp.get("ci_high", tp["mean"]))
21
22# Plot with confidence intervals
23plt.figure(figsize=(10, 6))
24plt.plot(param_values, throughputs, 'o-', label='Mean Throughput')
25plt.fill_between(param_values, ci_lows, ci_highs, alpha=0.3, label='95% CI')
26plt.xlabel(param_name.title())
27plt.ylabel('Throughput (requests/sec)')
28plt.title(f'Throughput vs {param_name.title()}')
29plt.legend()
30plt.grid(True)
31plt.savefig('throughput_sweep.png')

Example 5: Export to Pandas DataFrame

1import pandas as pd
2
3# Convert per-combination metrics to DataFrame
4rows = []
5for combo in sweep_data["per_combination_metrics"]:
6 row = combo["parameters"].copy()
7
8 # Add metrics
9 for metric_name, metric_data in combo["metrics"].items():
10 if isinstance(metric_data, dict):
11 row[f"{metric_name}_mean"] = metric_data.get("mean")
12 row[f"{metric_name}_std"] = metric_data.get("std")
13 row[f"{metric_name}_cv"] = metric_data.get("cv")
14 else:
15 row[metric_name] = metric_data
16 rows.append(row)
17
18df = pd.DataFrame(rows)
19
20# Sort by parameter values
21param_names = [p["name"] for p in sweep_data["metadata"]["sweep_parameters"]]
22df = df.sort_values(param_names)
23
24# Analyze
25print(df[[*param_names, "request_throughput_avg_mean", "ttft_p99_ms_mean"]])
26
27# Export
28df.to_csv("sweep_analysis.csv", index=False)

Example 6: Multi-Parameter Sweep Analysis

1# For sweeps with multiple parameters
2sweep_params = sweep_data["metadata"]["sweep_parameters"]
3param_names = [p["name"] for p in sweep_params]
4
5print(f"Multi-parameter sweep: {', '.join(param_names)}")
6
7# Find best combination for each parameter individually
8for param_name in param_names:
9 # Group by this parameter
10 param_groups = {}
11 for combo in sweep_data["per_combination_metrics"]:
12 param_value = combo["parameters"][param_name]
13 if param_value not in param_groups:
14 param_groups[param_value] = []
15 param_groups[param_value].append(combo)
16
17 # Find best throughput for each value of this parameter
18 print(f"\nBest throughput for each {param_name}:")
19 for value, combos in sorted(param_groups.items()):
20 best_combo = max(combos,
21 key=lambda c: c["metrics"]["request_throughput_avg"]["mean"])
22 throughput = best_combo["metrics"]["request_throughput_avg"]["mean"]
23 print(f" {param_name}={value}: {throughput:.1f} req/s")
24 print(f" Full config: {best_combo['parameters']}")

Example 7: Identify Diminishing Returns

1# For single-parameter sweeps, calculate efficiency
2combinations = sweep_data["per_combination_metrics"]
3param_name = sweep_data["metadata"]["sweep_parameters"][0]["name"]
4
5# Sort by parameter value
6combinations_sorted = sorted(combinations,
7 key=lambda c: c["parameters"][param_name])
8
9efficiencies = []
10for combo in combinations_sorted:
11 param_value = combo["parameters"][param_name]
12 throughput = combo["metrics"]["request_throughput_avg"]["mean"]
13 efficiency = throughput / param_value
14 efficiencies.append((param_value, efficiency))
15
16# Find point of diminishing returns (where efficiency drops significantly)
17threshold = 0.8 # 20% drop
18for i in range(1, len(efficiencies)):
19 if efficiencies[i][1] < threshold * efficiencies[i-1][1]:
20 print(f"Diminishing returns detected at {param_name}={efficiencies[i][0]}")
21 print(f" Efficiency dropped from {efficiencies[i-1][1]:.2f} to {efficiencies[i][1]:.2f}")
22 break

Example 8: Multi-Objective Decision Making

1# Score configurations based on weighted objectives
2weights = {
3 "throughput": 0.6, # 60% weight on throughput
4 "latency": 0.4, # 40% weight on latency
5}
6
7# Extract all throughputs and latencies
8combinations = sweep_data["per_combination_metrics"]
9throughputs = [c["metrics"]["request_throughput_avg"]["mean"] for c in combinations]
10latencies = [c["metrics"]["ttft_p99_ms"]["mean"] for c in combinations]
11
12max_tp = max(throughputs)
13min_lat = min(latencies)
14max_lat = max(latencies)
15
16scores = []
17for combo in combinations:
18 tp = combo["metrics"]["request_throughput_avg"]["mean"]
19 lat = combo["metrics"]["ttft_p99_ms"]["mean"]
20
21 # Normalize: higher is better for both
22 tp_score = tp / max_tp
23 lat_score = 1 - (lat - min_lat) / (max_lat - min_lat) if max_lat > min_lat else 1.0
24
25 # Weighted combination
26 score = weights["throughput"] * tp_score + weights["latency"] * lat_score
27 scores.append((combo["parameters"], score))
28
29# Find best configuration
30best_params, best_score = max(scores, key=lambda x: x[1])
31print(f"Best configuration for given weights: {best_params}")
32print(f" Score: {best_score:.3f}")

See Also