nat.profiler.calc.calc_runner#

Attributes#

Classes#

LinearFitAnalyzer

Handles linear regression analysis for concurrency vs time metrics.

CalcRunner

Calculator for GPU sizing based on concurrency vs. time metrics.

Module Contents#

logger#
class LinearFitAnalyzer(fit_config: nat.profiler.calc.data_models.FitConfig)#

Handles linear regression analysis for concurrency vs time metrics.

fit_config#
llm_latency_fit: nat.profiler.calc.calculations.LinearFitResult | None = None#
wf_runtime_fit: nat.profiler.calc.calculations.LinearFitResult | None = None#
analyze_metrics(
sizing_metrics_per_concurrency: dict[int, nat.profiler.calc.data_models.SizingMetrics],
) dict[int, nat.profiler.calc.data_models.CalcAlerts]#

Analyze metrics and return alerts including outlier information.

Returns:

dict[int, CalcAlerts]: Alerts per concurrency including outlier flags

class CalcRunner(config: nat.profiler.calc.data_models.CalcRunnerConfig)#

Calculator for GPU sizing based on concurrency vs. time metrics.

Initialize CalcRunner with a config file and a list of concurrencies.

config#
metrics_per_concurrency: dict[int, nat.profiler.calc.data_models.SizingMetrics]#
valid_concurrencies: list = []#
gpu_estimates_per_concurrency: dict[int, nat.profiler.calc.data_models.GPUEstimates]#
alerts_per_concurrency: dict[int, nat.profiler.calc.data_models.CalcAlerts]#
linear_analyzer#
validate_config() None#

Validate the configuration parameters. Raises ValueError if configuration is invalid.

property target_llm_latency: float#
property target_wf_runtime: float#
property target_users: int#
property test_gpu_count: int#
property append_job: bool#
property output_dir: pathlib.Path#
_calc_gpu_estimates_based_on_slope(
sizing_metrics_per_concurrency: dict[int, nat.profiler.calc.data_models.SizingMetrics],
use_latency: bool,
use_runtime: bool,
) nat.profiler.calc.data_models.GPUEstimates#

Calculate GPU estimates based on the linear fit results

_calc_gpu_estimates_per_concurrency(
sizing_metrics_per_concurrency: dict[int, nat.profiler.calc.data_models.SizingMetrics],
)#

Calculate per-concurrency GPU estimates and existing alerts.

_validate_gpu_estimation_parameters(
use_latency: bool,
use_runtime: bool,
) bool#

Validate parameters required for GPU estimation.

_validate_metrics_data(sizing_metrics_per_concurrency: dict) dict#

Validate and filter metrics data.

_calc_fit_and_gpu_estimate(
sizing_metrics_per_concurrency: dict[int, nat.profiler.calc.data_models.SizingMetrics],
) nat.profiler.calc.data_models.GPUEstimates#

Estimate GPU count to meet target latency and/or workflow runtime SLA for a given target user load.

Returns: - GPU estimates based on the slope of the time vs concurrency - GPU estimates per concurrency (rough estimates) - Alerts per concurrency (outliers, etc.)

generate_calc_runner_output() nat.profiler.calc.data_models.CalcRunnerOutput#

Build CalcRunnerOutput from sizing metrics per concurrency.

plot_concurrency_vs_time_metrics(output_dir: pathlib.Path)#

Plots concurrency vs. time metrics using pre-computed fits.

write_output(
output_dir: pathlib.Path,
calc_runner_output: nat.profiler.calc.data_models.CalcRunnerOutput,
)#

Write the output to the output directory.

run_offline() nat.profiler.calc.data_models.CalcRunnerOutput#

Run in offline mode. 1. Read previous jobs in online mode and create sizing metrics per concurrency 2. Calculate GPU estimates 3. Write the output to the offline subdirectory

async run_online() nat.profiler.calc.data_models.CalcRunnerOutput#

Create a MultiEvaluationRunner with concurrency overrides. Run in online mode. 1. Run the workflow 2. Create sizing metrics per concurrency from the profiler results and usage stats 3. Calculate GPU estimates 4. Write the output to the online subdirectory

async run() nat.profiler.calc.data_models.CalcRunnerOutput#

online mode: 1. Run the workflow 2. Collect profiler results and usage stats 3. Calculate GPU estimates 4. Write the output to the online subdirectory

offline mode: 1. Read previous jobs in online mode and only append unique concurrency values to metrics_per_concurrency 2. Calculate GPU estimates 3. Write the output to the offline subdirectory