For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Welcome to AIPerf Documentation
  • Getting Started
    • Profiling with AIPerf
    • Comprehensive LLM Benchmarking
    • Migrating from GenAI-Perf
    • GenAI-Perf vs AIPerf CLI Feature Comparison Matrix
  • Tutorials
      • Prefix Synthesis API Reference
      • Sweep Aggregates API Reference
      • Search History API Reference
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Overview
  • Output Files
  • JSON Schema
  • Top-Level Structure
  • Metadata Section
  • Per-Combination Metrics Section
  • Best Configurations Section
  • Pareto Optimal Section
  • CSV Format
  • Structure
  • Per-Combination Metrics Table
  • Best Configurations Section
  • Pareto Optimal Section
  • Metadata Section
  • Artifact Directory Structure
  • Artifact Directory Layout Reference
  • Repeated Mode (--parameter-sweep-mode repeated)
  • Independent Mode (--parameter-sweep-mode independent)
  • Single-Trial Sweep
  • Programmatic Analysis Examples
  • Example 1: Load and Inspect Sweep Results
  • Example 2: Find Optimal Configuration
  • Example 3: Analyze Pareto Frontier
  • Example 4: Compare Confidence Intervals
  • Example 5: Export to Pandas DataFrame
  • Example 6: Multi-Parameter Sweep Analysis
  • Example 7: Identify Diminishing Returns
  • Example 8: Multi-Objective Decision Making
  • See Also
API

Sweep Aggregate API Reference

||View as Markdown|
Previous

Prefix Synthesis API Reference

Next

Search History API Reference

Complete API documentation for parameter sweep aggregate outputs, including JSON schema, CSV format, and programmatic analysis examples.

Overview

When running parameter sweeps with AIPerf (e.g., --concurrency 10,20,30), the system generates sweep aggregate files that summarize performance across all parameter combinations. These aggregates enable:

  • Comparison of performance across parameter combinations
  • Identification of optimal configurations
  • Pareto frontier analysis for multi-objective optimization
  • Statistical analysis with confidence intervals (when using --num-profile-runs > 1)

Output Files

Sweep aggregates are written to different locations depending on the sweep mode:

Sweep-only (no --num-profile-runs):

artifacts/
{benchmark_name}/
sweep_aggregate/
profile_export_aiperf_sweep.json # Structured data for programmatic analysis
profile_export_aiperf_sweep.csv # Tabular format for spreadsheet analysis

Independent Mode (sweep + --num-profile-runs > 1 + --parameter-sweep-mode independent):

artifacts/
{benchmark_name}/
concurrency_10/aggregate/ # Per-value confidence aggregates
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_20/aggregate/
...
sweep_aggregate/ # Cross-value sweep analysis
profile_export_aiperf_sweep.json
profile_export_aiperf_sweep.csv

Repeated Mode (sweep + --num-profile-runs > 1, default mode):

artifacts/
{benchmark_name}/
aggregate/
concurrency_10/ # Per-value confidence aggregates
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_20/
...
sweep_aggregate/ # Cross-value sweep analysis
profile_export_aiperf_sweep.json
profile_export_aiperf_sweep.csv

See Artifact Directory Layout Reference below for the full table of layout cases.

The sweep aggregate files contain cross-value analysis including best configurations and Pareto optimal points.


JSON Schema

Top-Level Structure

1{
2 "aggregation_type": "sweep",
3 "num_profile_runs": 12,
4 "num_successful_runs": 12,
5 "failed_runs": [],
6 "metadata": { ... },
7 "per_combination_metrics": [ ... ],
8 "best_configurations": { ... },
9 "pareto_optimal": [ ... ]
10}

Top-Level Fields:

FieldTypeDescription
aggregation_typestringAlways "sweep" for sweep aggregates
num_profile_runsintTotal number of profile runs executed
num_successful_runsintNumber of successful profile runs
failed_runsarrayList of failed runs with error details (empty if all succeeded)
metadataobjectSweep configuration and execution metadata
per_combination_metricsarrayList of metrics for each parameter combination
best_configurationsobjectBest parameter combinations for key metrics
pareto_optimalarrayList of Pareto optimal parameter combinations

Metadata Section

Contains information about the sweep configuration.

1{
2 "metadata": {
3 "sweep_parameters": [
4 {
5 "name": "concurrency",
6 "values": [10, 20, 30, 40]
7 }
8 ],
9 "num_combinations": 4
10 }
11}

Fields:

FieldTypeDescription
sweep_parametersarrayList of parameter definitions (name and values)
num_combinationsintTotal number of parameter combinations tested
aggregation_typestringAlways "sweep" (duplicated from the top-level field so consumers that key off output["metadata"]["aggregation_type"] work without first checking the top-level key)
sla_constraintsobjectPresent only when plan.sweep.sla_filters is non-empty. Contains active_filters (list of filter dicts), feasible_count (int), and infeasible_count (int). See src/aiperf/orchestrator/aggregation/sweep_sla_filter.py for the filter shape.

Note: For QMC sweeps, sampling_design.json is written to <base>/sweep_aggregate/sampling_design.json in single-trial and independent modes. In repeated multi-run mode the sweep aggregate can live under <base>/aggregate/sweep_aggregate/, so the sampling design is not necessarily a sibling of the repeated-mode aggregate directory.

Sweep Parameters Structure:

Each parameter definition contains:

  • name: Parameter name (e.g., "concurrency", "request_rate")
  • values: List of values tested for this parameter

Per-Combination Metrics Section

Contains aggregated metrics for each parameter combination. This is a list where each entry represents one combination.

1{
2 "per_combination_metrics": [
3 {
4 "parameters": {
5 "concurrency": 10
6 },
7 "metrics": {
8 "request_throughput_avg": {
9 "mean": 100.5,
10 "std": 5.2,
11 "min": 95.0,
12 "max": 108.0,
13 "cv": 0.052,
14 "ci_low": 94.3,
15 "ci_high": 106.7,
16 "unit": "requests/sec"
17 },
18 "time_to_first_token_p99": {
19 "mean": 120.5,
20 "std": 8.1,
21 "min": 110.2,
22 "max": 132.8,
23 "cv": 0.067,
24 "ci_low": 111.5,
25 "ci_high": 129.5,
26 "unit": "ms"
27 }
28 }
29 },
30 {
31 "parameters": {
32 "concurrency": 20
33 },
34 "metrics": { ... }
35 }
36 ]
37}

Combination Entry Fields:

FieldTypeDescription
parametersobjectDictionary of parameter names to values for this combination
metricsobjectDictionary of metric names to statistics

Metric Statistics Fields:

FieldTypeDescription
meanfloatMean value across trials
stdfloatStandard deviation across trials
minfloatMinimum value observed
maxfloatMaximum value observed
cvfloatCoefficient of variation (std/mean)
ci_lowfloatLower bound of confidence interval
ci_highfloatUpper bound of confidence interval
unitstringUnit of measurement

Note: Fields se (standard error) and t_critical (critical t-value) exist on the underlying ConfidenceMetric dataclass and are emitted by the per-variation confidence aggregate (profile_export_aiperf_aggregate.json), but the sweep aggregate’s per-combination block strips them.

Note: For single-trial sweeps (--num-profile-runs 1 or omitted), the per-combination metric block still emits the full field set, but the spread fields collapse to degenerate values: std=0, cv=0, ci_low=ci_high=mean. The single-trial projection also emits an avg alias of mean and passes through every populated percentile field (p1, p5, p10, p25, p50, p75, p90, p95, p99) directly from the underlying JsonMetricResult.

Best Configurations Section

Identifies the parameter combinations that achieved the best performance for key metrics.

1{
2 "best_configurations": {
3 "best_throughput": {
4 "parameters": {
5 "concurrency": 40
6 },
7 "metric": 350.2,
8 "unit": "requests/sec"
9 },
10 "best_latency_p99": {
11 "parameters": {
12 "concurrency": 10
13 },
14 "metric": 120.5,
15 "unit": "ms"
16 }
17 }
18}

Configuration Fields:

FieldTypeDescription
parametersobjectParameter combination that achieved best performance
metricfloatThe metric value achieved
unitstringUnit of measurement

Available Configurations:

  • best_throughput: Highest request_throughput_avg
  • best_latency_p99: Lowest time_to_first_token_p99 (or request_latency_p99 as fallback)

Pareto Optimal Section

Lists parameter combinations that are Pareto optimal - configurations where no other configuration is strictly better on all objectives simultaneously.

1{
2 "pareto_optimal": [
3 {"concurrency": 10},
4 {"concurrency": 30},
5 {"concurrency": 40}
6 ]
7}

Default Objectives:

  • Maximize: request_throughput_avg (throughput)
  • Minimize: time_to_first_token_p99 (latency)

A configuration is Pareto optimal if:

  • No other configuration has both higher throughput AND lower latency
  • It represents a valid trade-off point on the efficiency frontier

Example Interpretation:

Concurrency 10: Low latency, moderate throughput (latency-optimized)
Concurrency 30: Balanced latency and throughput
Concurrency 40: High throughput, higher latency (throughput-optimized)

Multi-Parameter Sweeps:

For sweeps with multiple parameters (e.g., --concurrency 10,20 --request-rate 5,10), each Pareto optimal entry contains all parameter values:

1{
2 "pareto_optimal": [
3 {"concurrency": 10, "request_rate": 5},
4 {"concurrency": 20, "request_rate": 10}
5 ]
6}

CSV Format

The CSV export provides a tabular view optimized for spreadsheet analysis and plotting.

Structure

The CSV file contains multiple sections separated by blank lines:

  1. Per-Combination Metrics Table (main data)
  2. Best Configurations
  3. Pareto Optimal Points
  4. Metadata

Per-Combination Metrics Table

The first section is a wide-format table with one row per parameter combination:

1concurrency,request_throughput_avg_mean,request_throughput_avg_std,request_throughput_avg_min,request_throughput_avg_max,request_throughput_avg_cv,time_to_first_token_p99_mean,time_to_first_token_p99_std,time_to_first_token_p99_min,time_to_first_token_p99_max,time_to_first_token_p99_cv
210,100.50,5.20,95.00,108.00,0.0520,120.50,8.10,110.20,132.80,0.0672
320,180.30,8.50,170.00,195.00,0.0471,135.20,9.30,125.00,148.00,0.0688
430,270.80,12.10,255.00,290.00,0.0447,155.80,11.20,142.00,172.00,0.0719
540,285.50,15.30,265.00,310.00,0.0536,180.30,13.50,165.00,200.00,0.0749

Columns:

  • Parameter columns (e.g., concurrency, request_rate)
  • For each metric: {metric}_mean, {metric}_std, {metric}_min, {metric}_max, {metric}_cv

Multi-Parameter Example:

1concurrency,request_rate,request_throughput_avg_mean,request_throughput_avg_std,...
210,5,50.25,2.10,...
310,10,95.30,4.50,...
420,5,98.40,3.20,...
520,10,185.60,7.80,...

Best Configurations Section

1Best Configurations
2Configuration,concurrency,Metric,Unit
3Best Throughput,40,285.50,requests/sec
4Best Latency P99,10,120.50,ms

For multi-parameter sweeps:

1Best Configurations
2Configuration,concurrency,request_rate,Metric,Unit
3Best Throughput,40,10,350.20,requests/sec
4Best Latency P99,10,5,95.30,ms

Pareto Optimal Section

1Pareto Optimal Points
2concurrency
310
430
540

For multi-parameter sweeps:

1Pareto Optimal Points
2concurrency,request_rate
310,5
420,10
540,10

Empty frontier: When no frontier can be computed (a required objective metric is missing from the per-combination block, or every cell was filtered out by SLA constraints), the section renders a single literal None row beneath the Pareto Optimal Points header instead of the parameter-name header + rows.

Metadata Section

1Metadata
2Field,Value
3Aggregation Type,sweep
4Sweep Parameters,concurrency
5Number of Combinations,4
6Number of Profile Runs,12
7Number of Successful Runs,12

Artifact Directory Structure

Artifact Directory Layout Reference

The artifact tree branches on three flags: whether a sweep is configured (is_sweep), whether multiple trials run per cell (trials > 1), and the sweep iteration order (REPEATED vs INDEPENDENT).

sweeptrialsorderlayout
no1-<base>/
no>1-<base>/profile_runs/run_NNNN/
yes1-<base>/<dir_name>/
yes>1REPEATED<base>/profile_runs/trial_NNNN/<dir_name>/
yes>1INDEPENDENT<base>/<dir_name>/profile_runs/trial_NNNN/
adaptiveany-<base>/search_iter_NNNN/profile_runs/run_NNNN/

<dir_name> is the {leaf_param_name}_{value} form (e.g. concurrency_10, request_rate_5.0); multi-dim sweep cells join components with __ (e.g. concurrency_10__isl_512). Inner-dir naming is asymmetric on purpose: the no-sweep multi-run case uses run_NNNN, the sweep + INDEPENDENT case uses trial_NNNN.

The sweep-level aggregate path follows a parallel rule:

  • REPEATED + multi-run: <base>/aggregate/sweep_aggregate/
  • everything else (sweep-only, sweep + INDEPENDENT): <base>/sweep_aggregate/

Per-variation aggregates land at <base>/aggregate/<dir_name>/ in REPEATED mode and <base>/<dir_name>/aggregate/ in INDEPENDENT mode.

Repeated Mode (--parameter-sweep-mode repeated)

Default mode where the full sweep is executed N times:

artifacts/
{benchmark_name}/
profile_runs/
trial_0001/
concurrency_10/
profile_export_aiperf.json
profile_export.jsonl
concurrency_20/
profile_export_aiperf.json
profile_export.jsonl
concurrency_30/
profile_export_aiperf.json
profile_export.jsonl
trial_0002/
concurrency_10/
concurrency_20/
concurrency_30/
trial_0003/
concurrency_10/
concurrency_20/
concurrency_30/
aggregate/
concurrency_10/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_20/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_30/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
sweep_aggregate/
profile_export_aiperf_sweep.json
profile_export_aiperf_sweep.csv

Execution Pattern:

Trial 1: [10 → 20 → 30]
Trial 2: [10 → 20 → 30]
Trial 3: [10 → 20 → 30]

Independent Mode (--parameter-sweep-mode independent)

All trials at each parameter value before moving to the next:

artifacts/
{benchmark_name}/
concurrency_10/
profile_runs/
trial_0001/
profile_export_aiperf.json
profile_export.jsonl
trial_0002/
trial_0003/
aggregate/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_20/
profile_runs/
trial_0001/
trial_0002/
trial_0003/
aggregate/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
concurrency_30/
profile_runs/
trial_0001/
trial_0002/
trial_0003/
aggregate/
profile_export_aiperf_aggregate.json
profile_export_aiperf_aggregate.csv
sweep_aggregate/
profile_export_aiperf_sweep.json
profile_export_aiperf_sweep.csv

Execution Pattern:

Concurrency 10: [trial1, trial2, trial3]
Concurrency 20: [trial1, trial2, trial3]
Concurrency 30: [trial1, trial2, trial3]

Single-Trial Sweep

When --num-profile-runs 1 (or omitted), no trial directories are created:

artifacts/
{benchmark_name}/
concurrency_10/
profile_export_aiperf.json
profile_export.jsonl
concurrency_20/
profile_export_aiperf.json
profile_export.jsonl
concurrency_30/
profile_export_aiperf.json
profile_export.jsonl
sweep_aggregate/
profile_export_aiperf_sweep.json
profile_export_aiperf_sweep.csv

Programmatic Analysis Examples

Example 1: Load and Inspect Sweep Results

1import json
2from pathlib import Path
3
4# Load sweep aggregate
5sweep_file = Path("artifacts/my_benchmark/sweep_aggregate/profile_export_aiperf_sweep.json")
6with open(sweep_file) as f:
7 sweep_data = json.load(f)
8
9# Inspect metadata
10metadata = sweep_data["metadata"]
11sweep_params = metadata["sweep_parameters"]
12print(f"Sweep parameters: {[p['name'] for p in sweep_params]}")
13print(f"Total combinations: {metadata['num_combinations']}")
14print(f"Total runs: {sweep_data['num_profile_runs']}")

Example 2: Find Optimal Configuration

1# Get best configurations
2best_configs = sweep_data["best_configurations"]
3
4best_throughput = best_configs["best_throughput"]
5print(f"Best throughput: {best_throughput['metric']:.2f} {best_throughput['unit']}")
6print(f" Parameters: {best_throughput['parameters']}")
7
8best_latency = best_configs["best_latency_p99"]
9print(f"Best latency: {best_latency['metric']:.2f} {best_latency['unit']}")
10print(f" Parameters: {best_latency['parameters']}")

Example 3: Analyze Pareto Frontier

1# Get Pareto optimal points
2pareto_optimal = sweep_data["pareto_optimal"]
3print(f"Found {len(pareto_optimal)} Pareto optimal configurations")
4
5# Extract metrics for Pareto points
6per_combination_metrics = sweep_data["per_combination_metrics"]
7
8print("\nPareto Frontier:")
9for combo in per_combination_metrics:
10 params = combo["parameters"]
11 # Check if this combination is Pareto optimal
12 if params in pareto_optimal:
13 metrics = combo["metrics"]
14 throughput = metrics["request_throughput_avg"]["mean"]
15 latency = metrics["time_to_first_token_p99"]["mean"]
16 print(f" {params}: {throughput:.1f} req/s, {latency:.1f} ms p99")

Example 4: Compare Confidence Intervals

1import matplotlib.pyplot as plt
2
3# Extract data for single-parameter sweep
4combinations = sweep_data["per_combination_metrics"]
5
6# Assuming single parameter (concurrency)
7param_name = sweep_data["metadata"]["sweep_parameters"][0]["name"]
8param_values = []
9throughputs = []
10ci_lows = []
11ci_highs = []
12
13for combo in combinations:
14 param_value = combo["parameters"][param_name]
15 tp = combo["metrics"]["request_throughput_avg"]
16
17 param_values.append(param_value)
18 throughputs.append(tp["mean"])
19 ci_lows.append(tp.get("ci_low", tp["mean"]))
20 ci_highs.append(tp.get("ci_high", tp["mean"]))
21
22# Plot with confidence intervals
23plt.figure(figsize=(10, 6))
24plt.plot(param_values, throughputs, 'o-', label='Mean Throughput')
25plt.fill_between(param_values, ci_lows, ci_highs, alpha=0.3, label='95% CI')
26plt.xlabel(param_name.title())
27plt.ylabel('Throughput (requests/sec)')
28plt.title(f'Throughput vs {param_name.title()}')
29plt.legend()
30plt.grid(True)
31plt.savefig('throughput_sweep.png')

Example 5: Export to Pandas DataFrame

1import pandas as pd
2
3# Convert per-combination metrics to DataFrame
4rows = []
5for combo in sweep_data["per_combination_metrics"]:
6 row = combo["parameters"].copy()
7
8 # Add metrics
9 for metric_name, metric_data in combo["metrics"].items():
10 if isinstance(metric_data, dict):
11 row[f"{metric_name}_mean"] = metric_data.get("mean")
12 row[f"{metric_name}_std"] = metric_data.get("std")
13 row[f"{metric_name}_cv"] = metric_data.get("cv")
14 else:
15 row[metric_name] = metric_data
16 rows.append(row)
17
18df = pd.DataFrame(rows)
19
20# Sort by parameter values
21param_names = [p["name"] for p in sweep_data["metadata"]["sweep_parameters"]]
22df = df.sort_values(param_names)
23
24# Analyze
25print(df[[*param_names, "request_throughput_avg_mean", "time_to_first_token_p99_mean"]])
26
27# Export
28df.to_csv("sweep_analysis.csv", index=False)

Example 6: Multi-Parameter Sweep Analysis

1# For sweeps with multiple parameters
2sweep_params = sweep_data["metadata"]["sweep_parameters"]
3param_names = [p["name"] for p in sweep_params]
4
5print(f"Multi-parameter sweep: {', '.join(param_names)}")
6
7# Find best combination for each parameter individually
8for param_name in param_names:
9 # Group by this parameter
10 param_groups = {}
11 for combo in sweep_data["per_combination_metrics"]:
12 param_value = combo["parameters"][param_name]
13 if param_value not in param_groups:
14 param_groups[param_value] = []
15 param_groups[param_value].append(combo)
16
17 # Find best throughput for each value of this parameter
18 print(f"\nBest throughput for each {param_name}:")
19 for value, combos in sorted(param_groups.items()):
20 best_combo = max(combos,
21 key=lambda c: c["metrics"]["request_throughput_avg"]["mean"])
22 throughput = best_combo["metrics"]["request_throughput_avg"]["mean"]
23 print(f" {param_name}={value}: {throughput:.1f} req/s")
24 print(f" Full config: {best_combo['parameters']}")

Example 7: Identify Diminishing Returns

1# For single-parameter sweeps, calculate efficiency
2combinations = sweep_data["per_combination_metrics"]
3param_name = sweep_data["metadata"]["sweep_parameters"][0]["name"]
4
5# Sort by parameter value
6combinations_sorted = sorted(combinations,
7 key=lambda c: c["parameters"][param_name])
8
9efficiencies = []
10for combo in combinations_sorted:
11 param_value = combo["parameters"][param_name]
12 throughput = combo["metrics"]["request_throughput_avg"]["mean"]
13 efficiency = throughput / param_value
14 efficiencies.append((param_value, efficiency))
15
16# Find point of diminishing returns (where efficiency drops significantly)
17threshold = 0.8 # 20% drop
18for i in range(1, len(efficiencies)):
19 if efficiencies[i][1] < threshold * efficiencies[i-1][1]:
20 print(f"Diminishing returns detected at {param_name}={efficiencies[i][0]}")
21 print(f" Efficiency dropped from {efficiencies[i-1][1]:.2f} to {efficiencies[i][1]:.2f}")
22 break

Example 8: Multi-Objective Decision Making

1# Score configurations based on weighted objectives
2weights = {
3 "throughput": 0.6, # 60% weight on throughput
4 "latency": 0.4, # 40% weight on latency
5}
6
7# Extract all throughputs and latencies
8combinations = sweep_data["per_combination_metrics"]
9throughputs = [c["metrics"]["request_throughput_avg"]["mean"] for c in combinations]
10latencies = [c["metrics"]["time_to_first_token_p99"]["mean"] for c in combinations]
11
12max_tp = max(throughputs)
13min_lat = min(latencies)
14max_lat = max(latencies)
15
16scores = []
17for combo in combinations:
18 tp = combo["metrics"]["request_throughput_avg"]["mean"]
19 lat = combo["metrics"]["time_to_first_token_p99"]["mean"]
20
21 # Normalize: higher is better for both
22 tp_score = tp / max_tp
23 lat_score = 1 - (lat - min_lat) / (max_lat - min_lat) if max_lat > min_lat else 1.0
24
25 # Weighted combination
26 score = weights["throughput"] * tp_score + weights["latency"] * lat_score
27 scores.append((combo["parameters"], score))
28
29# Find best configuration
30best_params, best_score = max(scores, key=lambda x: x[1])
31print(f"Best configuration for given weights: {best_params}")
32print(f" Score: {best_score:.3f}")

See Also

  • Parameter Sweeping Tutorial - User guide with examples
  • Multi-Run Confidence Tutorial - Understanding confidence statistics
  • Working with Profile Exports - General export analysis
  • CLI Options Reference - Complete CLI documentation