aiq.profiler.inference_optimization.bottleneck_analysis.nested_stack_analysis#

An enhanced script that:

Groups events by example_number.
Builds a nested call tree (stack-based) for each example_number, so calls from different examples never nest.
Combines all calls into one global list for concurrency analysis.
Computes:

self_time, subtree_time for each call

concurrency distribution (p50, p90, p95, p99) across all examples

each node’s midpoint concurrency

a custom ‘bottleneck_score’ (here = subtree_time)

Optionally saves a Gantt chart.
Returns a Pydantic object with concurrency stats, node metrics, top bottlenecks, and a textual report.

Attributes#

logger

Functions#

`build_call_tree_for_example`(...)	Stack-based approach for a single example:
`build_call_tree_per_example`(...)
`compute_time_based_concurrency`(...)	Build a timeline of (start, +1), (end, -1) from all calls, then:
`find_midpoint_concurrency`(→ float)	Approximate concurrency for a node by finding the concurrency in timeline_segments
`save_gantt_chart`(→ None)	Save a Gantt chart as a PNG, color-coded by operation_type.
`analyze_calls_and_build_result`(...)
`multi_example_call_profiling`(...)	The high-level function:

Module Contents#

logger#

build_call_tree_for_example( example_df: pandas.DataFrame, ) → list[aiq.profiler.inference_optimization.data_models.CallNode]#

Stack-based approach for a single example:

Sort events by timestamp ascending.
On *_START => push a new node, attach to parent’s children if stack not empty.
On *_END => pop from stack if matches the top’s UUID, finalize end_time/duration.

Returns:: A list of top-level calls for this example.

build_call_tree_per_example( all_steps: list[list[aiq.data_models.intermediate_step.IntermediateStep]], ) → list[aiq.profiler.inference_optimization.data_models.CallNode]#

Group the DataFrame by example_number.
For each example, build a separate stack-based call tree.
Return a combined list of all top-level calls from all examples.

This ensures no cross-example nesting.

compute_time_based_concurrency( roots: list[aiq.profiler.inference_optimization.data_models.CallNode], ) → aiq.profiler.inference_optimization.data_models.ConcurrencyDistribution#

Build a timeline of (start, +1), (end, -1) from all calls, then:

Sort events by time
Create segments [ (t_i, t_{i+1}, concurrency) ]
Compute concurrency percentiles (p50, p90, p95, p99) based on total time spent at each concurrency.
This concurrency is across ALL calls from ALL examples.

Returns:#

ConcurrencyDistribution: with the piecewise segments + concurrency percentiles.

find_midpoint_concurrency( node: aiq.profiler.inference_optimization.data_models.CallNode, segments: list[tuple[float, float, int]], ) → float#: Approximate concurrency for a node by finding the concurrency in timeline_segments at the node’s midpoint (or start if zero-length).

save_gantt_chart( all_nodes: list[aiq.profiler.inference_optimization.data_models.CallNode], output_path: str, ) → None#: Save a Gantt chart as a PNG, color-coded by operation_type. Each node is displayed as a horizontal bar from start_time to end_time. The y-axis is the node index (sorted by start_time).

analyze_calls_and_build_result( roots: list[aiq.profiler.inference_optimization.data_models.CallNode], output_dir: str | None = None, ) → aiq.profiler.inference_optimization.data_models.NestedCallProfilingResult#

Compute concurrency distribution (p50, p90, p95, p99) across ALL calls in all examples.
For each node, compute self_time, subtree_time, concurrency at midpoint, bottleneck_score.
Identify top 5 bottlenecks (by subtree_time).
Build a textual report.
Optionally save a Gantt chart to ‘output_dir’.

Returns NestedCallProfilingResult.

multi_example_call_profiling( all_steps: list[list[aiq.data_models.intermediate_step.IntermediateStep]], output_dir: str | None = None, ) → aiq.profiler.inference_optimization.data_models.NestedCallProfilingResult#

The high-level function:

Build a forest of calls by grouping by example_number (so no cross-example nesting).
Analyze concurrency across all calls in all examples.
Return a NestedCallProfilingResult with concurrency distribution, node metrics, top bottlenecks, and textual report. Optionally saves a Gantt chart.

Parameters:

all_steps – Intermediate steps for each example.
output_dir – Directory path to save gantt_chart.png (if provided)

Returns:

NestedCallProfilingResult (pydantic)