Multi-URL Load Balancing
Multi-URL Load Balancing
Multi-URL Load Balancing
AIPerf supports distributing requests across multiple inference server instances for horizontal scaling. This is useful for:
Specify multiple --url options to enable load balancing:
Sample Output (Successful Run):
Sample Output (Successful Run):
Currently supported strategies:
You can explicitly set the strategy with --url-strategy:
--connection-reuse-strategy applies per-URL