aiq.profiler.inference_optimization.bottleneck_analysis.nested_stack_analysis#

An enhanced script that:

  1. Groups events by example_number.

  2. Builds a nested call tree (stack-based) for each example_number, so calls from different examples never nest.

  3. Combines all calls into one global list for concurrency analysis.

  4. Computes:

  • self_time, subtree_time for each call

  • concurrency distribution (p50, p90, p95, p99) across all examples

  • each node’s midpoint concurrency

  • a custom ‘bottleneck_score’ (here = subtree_time)

  1. Optionally saves a Gantt chart.

  2. Returns a Pydantic object with concurrency stats, node metrics, top bottlenecks, and a textual report.

Attributes#

Functions#

build_call_tree_for_example(...)

Stack-based approach for a single example:

build_call_tree_per_example(...)

compute_time_based_concurrency(...)

Build a timeline of (start, +1), (end, -1) from all calls, then:

find_midpoint_concurrency(→ float)

Approximate concurrency for a node by finding the concurrency in timeline_segments

save_gantt_chart(→ None)

Save a Gantt chart as a PNG, color-coded by operation_type.

analyze_calls_and_build_result(...)

multi_example_call_profiling(...)

The high-level function:

Module Contents#

logger#
build_call_tree_for_example(
example_df: pandas.DataFrame,
) list[aiq.profiler.inference_optimization.data_models.CallNode]#

Stack-based approach for a single example:

  1. Sort events by timestamp ascending.

  2. On *_START => push a new node, attach to parent’s children if stack not empty.

  3. On *_END => pop from stack if matches the top’s UUID, finalize end_time/duration.

Returns:

A list of top-level calls for this example.

build_call_tree_per_example(
all_steps: list[list[aiq.data_models.intermediate_step.IntermediateStep]],
) list[aiq.profiler.inference_optimization.data_models.CallNode]#
  1. Group the DataFrame by example_number.

  2. For each example, build a separate stack-based call tree.

  3. Return a combined list of all top-level calls from all examples.

This ensures no cross-example nesting.

compute_time_based_concurrency(
roots: list[aiq.profiler.inference_optimization.data_models.CallNode],
) aiq.profiler.inference_optimization.data_models.ConcurrencyDistribution#
Build a timeline of (start, +1), (end, -1) from all calls, then:
  • Sort events by time

  • Create segments [ (t_i, t_{i+1}, concurrency) ]

  • Compute concurrency percentiles (p50, p90, p95, p99) based on total time spent at each concurrency.

  • This concurrency is across ALL calls from ALL examples.

Returns:#

ConcurrencyDistribution

with the piecewise segments + concurrency percentiles.

find_midpoint_concurrency(
node: aiq.profiler.inference_optimization.data_models.CallNode,
segments: list[tuple[float, float, int]],
) float#

Approximate concurrency for a node by finding the concurrency in timeline_segments at the node’s midpoint (or start if zero-length).

save_gantt_chart(
all_nodes: list[aiq.profiler.inference_optimization.data_models.CallNode],
output_path: str,
) None#

Save a Gantt chart as a PNG, color-coded by operation_type. Each node is displayed as a horizontal bar from start_time to end_time. The y-axis is the node index (sorted by start_time).

analyze_calls_and_build_result(
roots: list[aiq.profiler.inference_optimization.data_models.CallNode],
output_dir: str | None = None,
) aiq.profiler.inference_optimization.data_models.NestedCallProfilingResult#
  1. Compute concurrency distribution (p50, p90, p95, p99) across ALL calls in all examples.

  2. For each node, compute self_time, subtree_time, concurrency at midpoint, bottleneck_score.

  3. Identify top 5 bottlenecks (by subtree_time).

  4. Build a textual report.

  5. Optionally save a Gantt chart to ‘output_dir’.

Returns NestedCallProfilingResult.

multi_example_call_profiling(
all_steps: list[list[aiq.data_models.intermediate_step.IntermediateStep]],
output_dir: str | None = None,
) aiq.profiler.inference_optimization.data_models.NestedCallProfilingResult#

The high-level function:

  1. Build a forest of calls by grouping by example_number (so no cross-example nesting).

  2. Analyze concurrency across all calls in all examples.

  3. Return a NestedCallProfilingResult with concurrency distribution, node metrics, top bottlenecks, and textual report. Optionally saves a Gantt chart.

Parameters:
  • all_steps – Intermediate steps for each example.

  • output_dir – Directory path to save gantt_chart.png (if provided)

Returns:

NestedCallProfilingResult (pydantic)