nat.profiler.calc.calculations#
Attributes#
Functions#
|
Concurrency is the independent variable (x-axis) and time metric (which can be runtime or latency) |
|
Remove outliers using the Interquartile Range (IQR) method. |
|
Calculate the GPU estimate based on the slope of the time metric. |
ROUGH ESTIMATE: Calculate GPU count estimate for a single concurrency level. |
Module Contents#
- logger#
- compute_slope(
- concurrencies: list[float],
- time_metrics: list[float],
- fit_config: nat.profiler.calc.data_models.FitConfig | None = None,
Concurrency is the independent variable (x-axis) and time metric (which can be runtime or latency) is the dependent variable (y-axis). This function computes the slope of the linear relationship between concurrency and time metric.
- Args:
concurrencies: List of concurrency values (x-axis) time_metrics: List of time metric values (y-axis) fit_config: Configuration for outlier detection and fit validation
- Returns:
LinearFitResult containing slope, intercept, R-squared, and outliers removed
- Raises:
ValueError: If the relationship is not linear (R² < min_r_squared)
- _remove_outliers(
- x: numpy.ndarray,
- y: numpy.ndarray,
- fit_config: nat.profiler.calc.data_models.FitConfig,
Remove outliers using the Interquartile Range (IQR) method. For small concurrency range (≤ threshold points), also checks raw y-values for extreme outliers.
- Args:
x: Input x values (concurrencies) y: Input y values (time metrics) fit_config: Configuration for outlier detection
- Returns:
Tuple of (cleaned_x, cleaned_y, list_of_removed_concurrencies)
- calc_gpu_estimate_based_on_slope(
- target_time_metric: float,
- target_users: int,
- test_gpu_count: int,
- observed_slope: float,
- observed_intercept: float = 0.0,
Calculate the GPU estimate based on the slope of the time metric.
This function uses the linear relationship between concurrency and time metrics to estimate the required GPU count for a target user load.
- Args:
target_time_metric: Target time metric (latency or runtime) in seconds observed_slope: Slope from linear regression of time vs concurrency target_users: Target number of concurrent users test_gpu_count: Number of GPUs used in the test observed_intercept: Y-intercept from linear regression (default: 0.0)
- Returns:
Estimated number of GPUs required
- Raises:
ValueError: If target_time_metric is less than or equal to intercept
- calc_gpu_estimate_for_single_concurrency(
- target_llm_latency: float,
- target_workflow_runtime: float,
- target_users: int,
- test_concurrency: int,
- test_gpu_count: int,
- observed_latency: float,
- observed_runtime: float,
ROUGH ESTIMATE: Calculate GPU count estimate for a single concurrency level.
This is a simplified estimate that assumes linear scaling and should be used as a baseline only. For more accurate estimates, use slope-based estimation with multiple concurrency levels.
- Formula based on the target latency:
G_required = (U_target / C_test) * (L_obs / L_target) * G_test
- Formula based on the target runtime:
G_required = (U_target / C_test) * (R_obs / R_target) * G_test
- where:
U_target: Target number of users
C_test: Test concurrency level
L_obs: Observed LLM latency
L_target: Target LLM latency
R_obs: Observed workflow runtime
R_target: Target workflow runtime
G_test: Test GPU count
WARNING: This is a rough estimate that: - Assumes perfect linear scaling (rarely true in practice) - Doesn’t account for GPU utilization inefficiencies - May underestimate GPU requirements for high concurrency - Should be validated against slope-based estimates