Generate PNG visualizations from AIPerf profiling data with automatic mode detection, NVIDIA brand styling, and support for multi-run comparisons and single-run analysis.
The aiperf plot command automatically detects whether to generate multi-run comparison plots or single-run time series analysis based on your directory structure, including nested profile_runs/run_000N directories from multi-run profiles. It integrates GPU telemetry and timeslice data when available. Aggregate summary directories may not be directly plottable; point aiperf plot at the run root or concrete per-run directories that contain profile exports.
Key Features:
~/.aiperf/plot_config.yamlMulti-Run Profile Discovery: When --num-profile-runs > 1 produces profile_runs/ subdirectories (e.g., artifacts/my_run/profile_runs/run_0001/ for no-sweep multi-run and trial_0001/ for sweep multi-run), the plot command auto-discovers them across no-sweep, REPEATED, INDEPENDENT, and adaptive Bayesian-optimization layouts. To plot a specific cell directly, you may also pass <base>/profile_runs/ explicitly.
Custom export filenames not supported: The plot command expects default export filenames (profile_export.jsonl, profile_export_aiperf.json). If you ran aiperf profile with --profile-export-file or a custom --profile-export-prefix, the output files will have different names and will not be detected by aiperf plot. To use the plot command, re-run profiling without custom export file options, or rename the files to match the default names.
Analyze a single profiling run:
Sample Output (Successful Run):
Compare multiple runs in a directory:
Sample Output (Successful Run):
Other common invocations:
Launch interactive dashboard for exploration:
Sample Output (Successful Run):
Use dark theme:
Sample Output (Successful Run):
Output directory logic:
--output specified: uses that path<first_input_path>/plots/./artifacts/plots/Customize plots: Edit ~/.aiperf/plot_config.yaml (auto-created on first run) to enable/disable plots or customize visualizations. See Plot Configuration for details.
The plot command automatically detects visualization mode based on directory structure:
Compares metrics across multiple profiling runs to identify optimal configurations.
Auto-detected when:
Example:
Default plots (4):
Use Experiment Classification to assign semantic colors (grey for baselines, green for treatments) for clearer visual distinction.

Shows how time to first token varies with request throughput across concurrency levels. Potentially useful for finding the sweet spot between responsiveness and capacity: ideal configurations maintain low TTFT even at high throughput. If TTFT increases sharply at certain throughput levels, this may indicate a prefill bottleneck (batch scheduler contention or compute limitations).

Highlights optimal configurations on the Pareto frontier that maximize GPU efficiency while minimizing latency. Points on the frontier are optimal; points below are suboptimal configurations. Potentially useful for choosing GPU count and batch sizes to maximize hardware ROI. A steep curve may indicate opportunities to improve latency with minimal throughput loss, while a flat curve can suggest you’re near the efficiency limit.

Shows the trade-off between GPU efficiency and interactivity (TTFT). Potentially useful for determining max concurrency before user experience degrades: flat regions show where adding concurrency maintains interactivity, while steep sections may indicate diminishing returns. The “knee” of the curve can help identify where throughput gains start to significantly hurt responsiveness.
Analyzes performance over time for a single profiling run.
Auto-detected when:
profile_export.jsonl directlyExample:
Default plots (5, enabled in shipped single_run_defaults):
ttft_over_time) - Time to first token per requestttft_timeline) - Per-request TTFT plotted against request start timetimeslices_ttft) - TTFT statistics per time windowtimeslices_itl) - Inter-token latency statistics per time windowgpu_utilization_and_throughput_over_time) - Correlated GPU usage and token rate (requires GPU telemetry)Commented-out by default (uncomment in ~/.aiperf/plot_config.yaml to enable):
itl_over_time) - ITL per requestlatency_over_time) - End-to-end latency progressiondispersed_throughput_over_time) - Continuous token generation rateAdditional plots (when data available):
--slice-duration used during profiling)--gpu-telemetry used during profiling)
Time to first token for each request, revealing prefill latency patterns and potential warm-up effects. Initial spikes may indicate cold start; stable later values show steady-state performance. Potentially useful for determining necessary warmup period or identifying warmup configuration issues. Unexpected spikes during steady-state can suggest resource contention, garbage collection pauses, or batch scheduler interference.

Inter-token latency per request, showing generation performance consistency. Consistent ITL may indicate stable generation; variance can suggest batch scheduling issues. Potentially useful for identifying decode-phase bottlenecks separate from prefill issues. If ITL increases over time, this may indicate KV cache memory pressure or growing batch sizes causing decode slowdown.

End-to-end latency progression throughout the run. Overall system health check: ramp-up at the start is normal, but sustained increases may indicate performance degradation. Potentially useful for identifying if your system maintains performance or degrades over time. Sudden jumps may correlate with other requests completing or starting, potentially revealing batch scheduling patterns.

Individual requests plotted as lines spanning their duration from start to end. Visualizes request scheduling and concurrency patterns: overlapping lines show concurrent execution, while gaps may indicate scheduling delays. Dense packing can suggest efficient utilization; sparse patterns may suggest underutilized capacity or rate limiting effects.
The Dispersed Throughput Over Time plot uses an event-based approach for accurate token generation rate visualization. Unlike binning methods that create artificial spikes, this distributes tokens evenly across their actual generation time:
This provides smooth, continuous representation that correlates better with server metrics like GPU utilization.

Smooth ramps may show healthy scaling; drops can indicate bottlenecks. Potentially useful for correlating with GPU metrics to identify whether bottlenecks are GPU-bound, memory-bound, or CPU-bound. A plateau may indicate you’ve reached max sustainable throughput for your configuration. Sudden drops can potentially correlate with resource exhaustion or scheduler saturation.
Customize which plots are generated and how they appear by editing ~/.aiperf/plot_config.yaml.
Multi-run plots:
Single-run plots:
Multi-run comparison plots group runs to create colored lines/series. Customize the groups: field in plot presets:
Group by model (useful for comparing different models):
Group by directory (useful for hierarchical experiments):
Group by run name (default - each run is separate):
When experiment classification is enabled, all multi-run plots automatically group by experiment_group to preserve treatment variants with semantic colors.
See the CONFIGURATION GUIDE section in ~/.aiperf/plot_config.yaml for detailed customization options.
Classify runs as “baseline” or “treatment” for semantic color assignment in multi-run comparisons.
Configuration (~/.aiperf/plot_config.yaml):
Result:
When enabled, all multi-run plots automatically group by experiment_group (directory name) to preserve individual treatment variants with semantic baseline/treatment colors.
Pattern notes: Uses glob syntax (* = wildcard), case-sensitive, first match wins.
Directory structure:
Result: 3 lines in plots (1 baseline + 2 treatments, each with semantic colors)
Advanced: Use group_extraction_pattern to aggregate variants:
src/aiperf/plot/default_plot_config.yaml for all configuration options.

The dark theme uses a dark background optimized for presentations while maintaining NVIDIA brand colors.






Launch an interactive localhost-hosted dashboard for real-time exploration of profiling data with dynamic metric selection, filtering, and visualization customization.
Key Features:
The dashboard automatically detects visualization mode (multi-run comparison or single-run analysis) and displays appropriate tabs and controls. Press Ctrl+C in the terminal to stop the server.
The dashboard binds to 127.0.0.1 by default and requires no authentication. For remote access, either bind on all interfaces with aiperf plot --dashboard --host 0.0.0.0 --port 9000 (only on trusted networks) or use SSH port forwarding: ssh -L 8050:localhost:8050 user@remote-host
Dashboard mode and PNG mode are separate. To generate both static PNGs and launch the dashboard, run the commands separately.
Multi-run plots (when telemetry available):
Single-run plots (time series):

Correlates compute resources with token generation performance. High GPU utilization with low throughput may suggest compute-bound workloads (consider optimizing model/batch size). Low utilization with low throughput can indicate bottlenecks elsewhere (KV cache, memory bandwidth, CPU scheduling). Potentially useful for targeting >80% GPU utilization for efficient hardware usage.
When timeslice data is available (via --slice-duration during profiling), plots show performance evolution across time windows.
Generated timeslice plots:
Timeslices enable easy outlier identification and bucketing analysis. Each time window (bucket) shows avg/p50/p95 statistics, making it simple to spot which periods have outlier performance. Slice 0 often shows cold-start overhead, while later slices may reveal degradation. Flat bars across slices may indicate stable performance; increasing trends can suggest resource exhaustion. Potentially useful for quickly isolating performance issues to specific phases (warmup, steady-state, or degradation).



Plots are saved as PNG files in the output directory:
Consistent Configurations: When comparing runs, vary only one parameter (e.g., concurrency) while keeping others constant. This isolates the impact of that specific parameter.
Use Experiment Classification: Configure experiment classification to distinguish baselines from treatments with semantic colors.
Include Warmup: Use --warmup-request-count to ensure steady state before measurement, reducing noise in visualizations.
Directory Structure: Ensure consistent naming - runs to compare must be in subdirectories of a common parent.
GPU Metrics: GPU telemetry plots only appear when telemetry data is available. Ensure DCGM is running during profiling. See GPU Telemetry Tutorial.
Solutions:
profile_export.jsonl files--profile-export-file or --profile-export-prefix during profiling, the output files have non-default names and will not be detected by the plot command. Re-run without custom export file options, or rename files to match the defaults (profile_export.jsonl, profile_export_aiperf.json)Solutions:
gpu_telemetry_export.jsonl exists and contains dataSolutions:
profile_export.jsonl directly insideprofile_export.jsonl files