Goodput is defined as the number of completed requests per second that meet specified metric constraints, also called service level objectives.
For example, perhaps you want to measure the user experience of your service by considering throughput only including requests where the time to first token is under 50ms and inter-token latency is under 10ms.
AIPerf provides this value as goodput.
Below you can find a tutorial on how to benchmark a model using goodput.
Example output:
good_request_fractionThe goodput metric above answers “how many SLO-compliant requests per second?”, but it does not by itself tell you whether your run met an attainment target like “95% of requests under SLO.” For that, AIPerf exposes a sibling derived metric, good_request_fraction, defined as:
The fraction is in [0.0, 1.0]. Errors land in the denominator on purpose: a backend that sheds load under pressure should not score 1.0 just because the requests it did serve happened to stay under the latency budget. On clean runs with zero errors, error_request_count is not produced (it carries the ERROR_ONLY flag and is only emitted for invalid records), so the denominator reduces to request_count and the fraction is computed from valid requests alone. When no requests were attempted at all, the metric returns 0.0.
The same --goodput invocation that produces goodput also produces good_request_fraction — no extra flag is required:
good_request_fraction is hidden from the console table (NO_CONSOLE) but is written to profile_export_aiperf.json, the CSV, and the Parquet output, so you can read it from CI:
The max-goodput-under-slo search recipe consumes this metric directly: it attaches the SLA filter good_request_fraction:avg:ge:<--slo-attainment-fraction> so Bayesian optimization marks any concurrency level that misses the attainment target as infeasible while still maximizing the underlying goodput rate.