nat.profiler.calc.calculations#

Attributes#

logger

Functions#

`compute_slope`(...)	Concurrency is the independent variable (x-axis) and time metric (which can be runtime or latency)
`_remove_outliers`(→ tuple[numpy.ndarray, numpy.ndarray, ...)	Remove outliers using the Interquartile Range (IQR) method.
`calc_gpu_estimate_based_on_slope`(→ float)	Calculate the GPU estimate based on the slope of the time metric.
`calc_gpu_estimate_for_single_concurrency`(...)	ROUGH ESTIMATE: Calculate GPU count estimate for a single concurrency level.

Module Contents#

logger#

compute_slope( concurrencies: list[float], time_metrics: list[float], fit_config: nat.profiler.calc.data_models.FitConfig | None = None, ) → nat.profiler.calc.data_models.LinearFitResult#

Concurrency is the independent variable (x-axis) and time metric (which can be runtime or latency) is the dependent variable (y-axis). This function computes the slope of the linear relationship between concurrency and time metric.

Args:: concurrencies: List of concurrency values (x-axis) time_metrics: List of time metric values (y-axis) fit_config: Configuration for outlier detection and fit validation
Returns:: LinearFitResult containing slope, intercept, R-squared, and outliers removed
Raises:: ValueError: If the relationship is not linear (R² < min_r_squared)

_remove_outliers( x: numpy.ndarray, y: numpy.ndarray, fit_config: nat.profiler.calc.data_models.FitConfig, ) → tuple[numpy.ndarray, numpy.ndarray, list[int]]#

Remove outliers using the Interquartile Range (IQR) method. For small concurrency range (≤ threshold points), also checks raw y-values for extreme outliers.

Args:: x: Input x values (concurrencies) y: Input y values (time metrics) fit_config: Configuration for outlier detection
Returns:: Tuple of (cleaned_x, cleaned_y, list_of_removed_concurrencies)

calc_gpu_estimate_based_on_slope( target_time_metric: float, target_users: int, test_gpu_count: int, observed_slope: float, observed_intercept: float = 0.0, ) → float#

Calculate the GPU estimate based on the slope of the time metric.

This function uses the linear relationship between concurrency and time metrics to estimate the required GPU count for a target user load.

Args:: target_time_metric: Target time metric (latency or runtime) in seconds observed_slope: Slope from linear regression of time vs concurrency target_users: Target number of concurrent users test_gpu_count: Number of GPUs used in the test observed_intercept: Y-intercept from linear regression (default: 0.0)
Returns:: Estimated number of GPUs required
Raises:: ValueError: If target_time_metric is less than or equal to intercept

calc_gpu_estimate_for_single_concurrency( target_llm_latency: float, target_workflow_runtime: float, target_users: int, test_concurrency: int, test_gpu_count: int, observed_latency: float, observed_runtime: float, ) → nat.profiler.calc.data_models.GPUEstimates#

ROUGH ESTIMATE: Calculate GPU count estimate for a single concurrency level.

This is a simplified estimate that assumes linear scaling and should be used as a baseline only. For more accurate estimates, use slope-based estimation with multiple concurrency levels.

Formula based on the target latency:

G_required = (U_target / C_test) * (L_obs / L_target) * G_test

Formula based on the target runtime:

G_required = (U_target / C_test) * (R_obs / R_target) * G_test

where:

U_target: Target number of users
C_test: Test concurrency level
L_obs: Observed LLM latency
L_target: Target LLM latency
R_obs: Observed workflow runtime
R_target: Target workflow runtime
G_test: Test GPU count

WARNING: This is a rough estimate that: - Assumes perfect linear scaling (rarely true in practice) - Doesn’t account for GPU utilization inefficiencies - May underestimate GPU requirements for high concurrency - Should be validated against slope-based estimates