nat.profiler.calc.calculations#

Attributes#

Functions#

compute_slope(...)

Concurrency is the independent variable (x-axis) and time metric (which can be runtime or latency)

_remove_outliers(→ tuple[numpy.ndarray, numpy.ndarray, ...)

Remove outliers using the Interquartile Range (IQR) method.

calc_gpu_estimate_based_on_slope(→ float)

Calculate the GPU estimate based on the slope of the time metric.

calc_gpu_estimate_for_single_concurrency(...)

ROUGH ESTIMATE: Calculate GPU count estimate for a single concurrency level.

Module Contents#

logger#
compute_slope(
concurrencies: list[float],
time_metrics: list[float],
fit_config: nat.profiler.calc.data_models.FitConfig | None = None,
) nat.profiler.calc.data_models.LinearFitResult#

Concurrency is the independent variable (x-axis) and time metric (which can be runtime or latency) is the dependent variable (y-axis). This function computes the slope of the linear relationship between concurrency and time metric.

Args:

concurrencies: List of concurrency values (x-axis) time_metrics: List of time metric values (y-axis) fit_config: Configuration for outlier detection and fit validation

Returns:

LinearFitResult containing slope, intercept, R-squared, and outliers removed

Raises:

ValueError: If the relationship is not linear (R² < min_r_squared)

_remove_outliers(
x: numpy.ndarray,
y: numpy.ndarray,
fit_config: nat.profiler.calc.data_models.FitConfig,
) tuple[numpy.ndarray, numpy.ndarray, list[int]]#

Remove outliers using the Interquartile Range (IQR) method. For small concurrency range (≤ threshold points), also checks raw y-values for extreme outliers.

Args:

x: Input x values (concurrencies) y: Input y values (time metrics) fit_config: Configuration for outlier detection

Returns:

Tuple of (cleaned_x, cleaned_y, list_of_removed_concurrencies)

calc_gpu_estimate_based_on_slope(
target_time_metric: float,
target_users: int,
test_gpu_count: int,
observed_slope: float,
observed_intercept: float = 0.0,
) float#

Calculate the GPU estimate based on the slope of the time metric.

This function uses the linear relationship between concurrency and time metrics to estimate the required GPU count for a target user load.

Args:

target_time_metric: Target time metric (latency or runtime) in seconds observed_slope: Slope from linear regression of time vs concurrency target_users: Target number of concurrent users test_gpu_count: Number of GPUs used in the test observed_intercept: Y-intercept from linear regression (default: 0.0)

Returns:

Estimated number of GPUs required

Raises:

ValueError: If target_time_metric is less than or equal to intercept

calc_gpu_estimate_for_single_concurrency(
target_llm_latency: float,
target_workflow_runtime: float,
target_users: int,
test_concurrency: int,
test_gpu_count: int,
observed_latency: float,
observed_runtime: float,
) nat.profiler.calc.data_models.GPUEstimates#

ROUGH ESTIMATE: Calculate GPU count estimate for a single concurrency level.

This is a simplified estimate that assumes linear scaling and should be used as a baseline only. For more accurate estimates, use slope-based estimation with multiple concurrency levels.

Formula based on the target latency:

G_required = (U_target / C_test) * (L_obs / L_target) * G_test

Formula based on the target runtime:

G_required = (U_target / C_test) * (R_obs / R_target) * G_test

where:
  • U_target: Target number of users

  • C_test: Test concurrency level

  • L_obs: Observed LLM latency

  • L_target: Target LLM latency

  • R_obs: Observed workflow runtime

  • R_target: Target workflow runtime

  • G_test: Test GPU count

WARNING: This is a rough estimate that: - Assumes perfect linear scaling (rarely true in practice) - Doesn’t account for GPU utilization inefficiencies - May underestimate GPU requirements for high concurrency - Should be validated against slope-based estimates