Benchmark Goodput with AIPerf
Context
Goodput is defined as the number of completed requests per second that meet specified metric constraints, also called service level objectives.
For example, perhaps you want to measure the user experience of your service by considering throughput only including requests where the time to first token is under 50ms and inter-token latency is under 10ms.
AIPerf provides this value as goodput.
Tutorial
Below you can find a tutorial on how to benchmark a model using goodput.
Setting Up the Server
Run AIPerf with Goodput Constraints
Example output: