aiq.profiler.inference_optimization.bottleneck_analysis.nested_stack_analysis#
An enhanced script that:
Groups events by example_number.
Builds a nested call tree (stack-based) for each example_number, so calls from different examples never nest.
Combines all calls into one global list for concurrency analysis.
Computes:
self_time, subtree_time for each call
concurrency distribution (p50, p90, p95, p99) across all examples
each node’s midpoint concurrency
a custom ‘bottleneck_score’ (here = subtree_time)
Optionally saves a Gantt chart.
Returns a Pydantic object with concurrency stats, node metrics, top bottlenecks, and a textual report.
Attributes#
Functions#
Stack-based approach for a single example: |
|
Build a timeline of (start, +1), (end, -1) from all calls, then: |
|
|
Approximate concurrency for a node by finding the concurrency in timeline_segments |
|
Save a Gantt chart as a PNG, color-coded by operation_type. |
The high-level function: |
Module Contents#
- logger#
- build_call_tree_for_example(
- example_df: pandas.DataFrame,
Stack-based approach for a single example:
Sort events by timestamp ascending.
On
*_START
=> push a new node, attach to parent’s children if stack not empty.On
*_END
=> pop from stack if matches the top’s UUID, finalize end_time/duration.
- Returns:
A list of top-level calls for this example.
- build_call_tree_per_example(
- all_steps: list[list[aiq.data_models.intermediate_step.IntermediateStep]],
Group the DataFrame by example_number.
For each example, build a separate stack-based call tree.
Return a combined list of all top-level calls from all examples.
This ensures no cross-example nesting.
- compute_time_based_concurrency( ) aiq.profiler.inference_optimization.data_models.ConcurrencyDistribution #
- Build a timeline of (start, +1), (end, -1) from all calls, then:
Sort events by time
Create segments [ (t_i, t_{i+1}, concurrency) ]
Compute concurrency percentiles (p50, p90, p95, p99) based on total time spent at each concurrency.
This concurrency is across ALL calls from ALL examples.
Returns:#
- ConcurrencyDistribution
with the piecewise segments + concurrency percentiles.
- find_midpoint_concurrency(
- node: aiq.profiler.inference_optimization.data_models.CallNode,
- segments: list[tuple[float, float, int]],
Approximate concurrency for a node by finding the concurrency in timeline_segments at the node’s midpoint (or start if zero-length).
- save_gantt_chart(
- all_nodes: list[aiq.profiler.inference_optimization.data_models.CallNode],
- output_path: str,
Save a Gantt chart as a PNG, color-coded by operation_type. Each node is displayed as a horizontal bar from start_time to end_time. The y-axis is the node index (sorted by start_time).
- analyze_calls_and_build_result(
- roots: list[aiq.profiler.inference_optimization.data_models.CallNode],
- output_dir: str | None = None,
Compute concurrency distribution (p50, p90, p95, p99) across ALL calls in all examples.
For each node, compute self_time, subtree_time, concurrency at midpoint, bottleneck_score.
Identify top 5 bottlenecks (by subtree_time).
Build a textual report.
Optionally save a Gantt chart to ‘output_dir’.
Returns NestedCallProfilingResult.
- multi_example_call_profiling(
- all_steps: list[list[aiq.data_models.intermediate_step.IntermediateStep]],
- output_dir: str | None = None,
The high-level function:
Build a forest of calls by grouping by example_number (so no cross-example nesting).
Analyze concurrency across all calls in all examples.
Return a NestedCallProfilingResult with concurrency distribution, node metrics, top bottlenecks, and textual report. Optionally saves a Gantt chart.
- Parameters:
all_steps – Intermediate steps for each example.
output_dir – Directory path to save gantt_chart.png (if provided)
- Returns:
NestedCallProfilingResult (pydantic)