Performance
You can use the genai-perf
tool to benchmark the performance of the Text Embedding NIM under simulated production load. genai-perf
comes pre-installed in the Triton Server SDK container.
To run a performance benchmark, first create a dataset of text examples that genai-perf
can use when making requests to the embedding service. These examples should be representative of the type of data that you expect to receive in a production setting. The dataset should be formatted as a JSONL file where each line contains a {"text": ...}
object, as shown in the following example.
Example: (embeddings.jsonl
)
{"text": "What was the first car ever driven?"}
{"text": "Who served as the 5th President of the United States of America?"}
{"text": "Is the Sydney Opera House located in Australia?"}
{"text": "In what state did they film Shrek 2?"}
Use the following example to run the Triton Inference Server SDK docker container, mounting the directory, as shown as datasets/
in the following example, where you created your JSONL file.
export RELEASE="yy.mm" # e.g. export RELEASE="24.07"
docker run -it --rm \
--gpus=all \
--network="host" \
--mount type=bind,source=${PWD}/datasets,target=/datasets \
nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
Execute the following command to run a performance benchmark using the genai-perf
command line tool.
genai-perf \
-m nvidia/nv-embedqa-e5-v5 \
--service-kind openai \
--endpoint-type embeddings \
--batch-size 2 \
--input-file /datasets/embeddings.jsonl \
--extra-inputs input_type:query \
--extra-inputs truncate:END \
--concurrency 5 \
--url http://localhost:8000
You can see the full set of command line options for genai-perf
in the Command Line Options section of the GenAI-Perf documentation.
All latency measurements are reported in milliseconds.
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 99.8 | 100.9 | 107.9 | 108.6 | 639.0 |
passage | 300 | 64 | 3 | 143.8 | 143.3 | 156.6 | 159.0 | 1330.0 |
passage | 300 | 64 | 5 | 239.7 | 239.7 | 259.1 | 265.0 | 1331.0 |
passage | 512 | 64 | 1 | 114.6 | 114.4 | 115.9 | 117.0 | 556.5 |
passage | 512 | 64 | 3 | 170.2 | 169.9 | 171.2 | 171.8 | 1124.2 |
passage | 512 | 64 | 5 | 284.6 | 284.5 | 285.6 | 286.1 | 1121.4 |
query | 20 | 1 | 1 | 5.1 | 5.1 | 5.4 | 5.4 | 196.3 |
query | 20 | 1 | 3 | 6.0 | 5.5 | 7.4 | 7.6 | 498.5 |
query | 20 | 1 | 5 | 11.9 | 12.3 | 12.8 | 12.9 | 418.3 |
query | 20 | 1 | 7 | 16.5 | 17.2 | 18.0 | 18.1 | 422.0 |
query | 20 | 1 | 9 | 21.4 | 22.3 | 23.3 | 23.6 | 418.3 |
query | 20 | 1 | 11 | 26.0 | 26.0 | 28.4 | 28.6 | 421.3 |
query | 20 | 1 | 13 | 30.7 | 30.9 | 33.1 | 33.6 | 422.2 |
query | 20 | 1 | 15 | 37.3 | 37.9 | 39.1 | 39.3 | 401.4 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 2554.3 | 2563.9 | 2678.1 | 2698.3 | 25.0 |
passage | 300 | 64 | 3 | 7349.2 | 7502.1 | 7889.3 | 7968.1 | 25.5 |
passage | 300 | 64 | 5 | 11913.2 | 12461.9 | 12893.4 | 12969.4 | 25.6 |
passage | 512 | 64 | 1 | 3701.9 | 3701.6 | 3703.1 | 3703.4 | 17.3 |
passage | 512 | 64 | 3 | 10730.2 | 10985.2 | 10987.0 | 11029.2 | 17.5 |
passage | 512 | 64 | 5 | 17355.4 | 14691.3 | 22035.4 | 22035.7 | 17.4 |
query | 20 | 1 | 1 | 32.4 | 32.4 | 32.7 | 32.8 | 30.7 |
query | 20 | 1 | 3 | 82.5 | 85.6 | 85.9 | 86.0 | 36.3 |
query | 20 | 1 | 5 | 135.5 | 142.9 | 143.3 | 143.3 | 36.8 |
query | 20 | 1 | 7 | 191.7 | 200.2 | 200.5 | 200.6 | 36.5 |
query | 20 | 1 | 9 | 246.9 | 257.4 | 257.8 | 257.9 | 36.4 |
query | 20 | 1 | 11 | 301.7 | 314.6 | 315.1 | 315.2 | 36.4 |
query | 20 | 1 | 13 | 356.6 | 371.6 | 372.2 | 372.4 | 36.4 |
query | 20 | 1 | 15 | 409.5 | 401.4 | 429.8 | 429.9 | 36.5 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 176.5 | 177.1 | 188.6 | 190.4 | 362.2 |
passage | 300 | 64 | 3 | 336.1 | 337.0 | 359.1 | 365.8 | 570.2 |
passage | 300 | 64 | 5 | 560.2 | 562.9 | 592.3 | 634.6 | 569.8 |
passage | 512 | 64 | 1 | 205.3 | 204.7 | 208.2 | 210.8 | 311.4 |
passage | 512 | 64 | 3 | 410.9 | 411.1 | 412.5 | 412.7 | 466.4 |
passage | 512 | 64 | 5 | 681.5 | 682.0 | 683.6 | 684.1 | 468.7 |
query | 20 | 1 | 1 | 5.3 | 5.3 | 5.6 | 5.7 | 186.3 |
query | 20 | 1 | 3 | 7.4 | 7.4 | 7.5 | 7.7 | 403.8 |
query | 20 | 1 | 5 | 11.9 | 12.4 | 12.6 | 12.8 | 419.2 |
query | 20 | 1 | 7 | 16.6 | 17.3 | 17.5 | 17.6 | 421.5 |
query | 20 | 1 | 9 | 21.2 | 22.1 | 22.5 | 22.6 | 423.9 |
query | 20 | 1 | 11 | 26.1 | 27.2 | 27.7 | 27.8 | 420.5 |
query | 20 | 1 | 13 | 30.8 | 31.2 | 32.6 | 32.7 | 422.3 |
query | 20 | 1 | 15 | 36.4 | 37.3 | 37.9 | 38.0 | 411.7 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 188.4 | 191.7 | 197.7 | 198.8 | 338.7 |
passage | 300 | 64 | 3 | 371.7 | 372.7 | 393.8 | 471.0 | 515.3 |
passage | 300 | 64 | 5 | 619.5 | 621.7 | 648.5 | 728.8 | 515.1 |
passage | 512 | 64 | 1 | 222.7 | 222.3 | 226.0 | 227.4 | 286.5 |
passage | 512 | 64 | 3 | 447.0 | 447.0 | 448.7 | 449.4 | 428.4 |
passage | 512 | 64 | 5 | 742.3 | 742.8 | 745.0 | 745.5 | 430.1 |
query | 20 | 1 | 1 | 6.6 | 6.6 | 7.0 | 7.1 | 149.3 |
query | 20 | 1 | 3 | 7.4 | 7.3 | 7.6 | 7.7 | 404.8 |
query | 20 | 1 | 5 | 11.8 | 12.2 | 12.5 | 12.6 | 421.5 |
query | 20 | 1 | 7 | 16.4 | 17.1 | 17.4 | 17.5 | 426.4 |
query | 20 | 1 | 9 | 20.9 | 21.9 | 22.3 | 22.4 | 429.9 |
query | 20 | 1 | 11 | 25.7 | 26.8 | 27.4 | 27.7 | 427.2 |
query | 20 | 1 | 13 | 30.4 | 31.5 | 32.1 | 32.2 | 427.4 |
query | 20 | 1 | 15 | 35.6 | 36.4 | 37.9 | 38.0 | 420.6 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 377.9 | 376.4 | 396.5 | 404.5 | 169.1 |
passage | 300 | 64 | 3 | 929.9 | 932.3 | 972.0 | 979.2 | 205.8 |
passage | 300 | 64 | 5 | 1545.5 | 1555.2 | 1597.3 | 1610.3 | 205.9 |
passage | 512 | 64 | 1 | 469.5 | 468.6 | 473.9 | 475.1 | 136.2 |
passage | 512 | 64 | 3 | 1178.9 | 1182.4 | 1183.0 | 1183.2 | 162.2 |
passage | 512 | 64 | 5 | 1958.1 | 1970.9 | 1971.6 | 1971.8 | 162.2 |
query | 20 | 1 | 1 | 11.1 | 11.1 | 11.5 | 11.6 | 89.8 |
query | 20 | 1 | 3 | 19.3 | 20.3 | 20.8 | 21.0 | 154.9 |
query | 20 | 1 | 5 | 32.1 | 34.0 | 34.6 | 34.8 | 155.5 |
query | 20 | 1 | 7 | 44.8 | 47.4 | 48.1 | 48.2 | 156.0 |
query | 20 | 1 | 9 | 57.7 | 60.9 | 61.8 | 62.0 | 155.8 |
query | 20 | 1 | 11 | 70.6 | 74.0 | 75.5 | 75.7 | 155.5 |
query | 20 | 1 | 13 | 83.8 | 82.8 | 89.2 | 89.6 | 154.9 |
query | 20 | 1 | 15 | 97.5 | 96.6 | 103.1 | 103.4 | 153.7 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 483.6 | 483.6 | 505.2 | 509.5 | 132.3 |
passage | 300 | 64 | 3 | 1328.0 | 1334.4 | 1367.9 | 1379.6 | 143.9 |
passage | 300 | 64 | 5 | 2181.8 | 2203.7 | 2241.9 | 2250.4 | 145.5 |
passage | 512 | 64 | 1 | 633.8 | 633.8 | 638.6 | 639.4 | 100.9 |
passage | 512 | 64 | 3 | 1744.9 | 1755.3 | 1761.5 | 1763.1 | 109.4 |
passage | 512 | 64 | 5 | 2892.2 | 2923.9 | 2934.8 | 2936.8 | 109.4 |
query | 20 | 1 | 1 | 8.0 | 8.0 | 8.3 | 8.3 | 124.1 |
query | 20 | 1 | 3 | 11.2 | 12.2 | 12.6 | 12.8 | 266.1 |
query | 20 | 1 | 5 | 19.9 | 20.6 | 21.1 | 21.2 | 250.3 |
query | 20 | 1 | 7 | 27.6 | 28.9 | 29.4 | 29.6 | 253.0 |
query | 20 | 1 | 9 | 35.1 | 36.7 | 37.3 | 37.5 | 256.1 |
query | 20 | 1 | 11 | 42.7 | 44.6 | 45.5 | 45.7 | 256.9 |
query | 20 | 1 | 13 | 50.7 | 50.3 | 54.0 | 54.2 | 255.9 |
query | 20 | 1 | 15 | 57.4 | 57.9 | 62.2 | 62.5 | 261.0 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 1954.6 | 1957.3 | 2010.7 | 2029.4 | 32.7 |
passage | 300 | 64 | 3 | 5650.6 | 5734.4 | 5950.8 | 7649.8 | 33.3 |
passage | 300 | 64 | 5 | 9470.5 | 9790.5 | 9947.0 | 10511.1 | 32.7 |
passage | 512 | 64 | 1 | 3038.8 | 3045.9 | 3079.6 | 3080.5 | 21.0 |
passage | 512 | 64 | 3 | 8659.0 | 8835.5 | 8944.2 | 8960.5 | 21.8 |
passage | 512 | 64 | 5 | 14292.3 | 14782.0 | 14948.8 | 14986.1 | 21.6 |
query | 20 | 1 | 1 | 29.3 | 29.2 | 29.5 | 29.6 | 34.0 |
query | 20 | 1 | 3 | 71.2 | 73.1 | 73.3 | 73.4 | 42.0 |
query | 20 | 1 | 5 | 113.8 | 121.7 | 122.2 | 122.3 | 43.9 |
query | 20 | 1 | 7 | 159.3 | 170.2 | 171.0 | 171.1 | 43.9 |
query | 20 | 1 | 9 | 204.7 | 217.6 | 219.9 | 220.0 | 43.9 |
query | 20 | 1 | 11 | 253.3 | 266.7 | 268.8 | 268.9 | 43.4 |
query | 20 | 1 | 13 | 299.2 | 295.0 | 317.5 | 317.7 | 43.4 |
query | 20 | 1 | 15 | 346.4 | 342.2 | 366.3 | 366.4 | 43.2 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 1322.4 | 1322.2 | 1362.2 | 1369.5 | 48.3 |
passage | 300 | 64 | 3 | 3670.5 | 3674.6 | 3798.4 | 3824.3 | 52.2 |
passage | 300 | 64 | 5 | 6188.9 | 6219.1 | 6368.4 | 6378.3 | 51.4 |
passage | 512 | 64 | 1 | 1990.0 | 1990.5 | 2013.8 | 2018.5 | 32.1 |
passage | 512 | 64 | 3 | 5586.0 | 5601.7 | 5683.0 | 5689.6 | 34.3 |
passage | 512 | 64 | 5 | 9358.7 | 9398.1 | 9525.5 | 9570.5 | 34.0 |
query | 20 | 1 | 1 | 21.5 | 21.5 | 21.8 | 21.8 | 46.3 |
query | 20 | 1 | 3 | 47.8 | 51.1 | 51.5 | 51.7 | 62.5 |
query | 20 | 1 | 5 | 82.1 | 85.3 | 85.8 | 85.9 | 60.8 |
query | 20 | 1 | 7 | 112.1 | 119.2 | 120.0 | 120.2 | 62.3 |
query | 20 | 1 | 9 | 143.5 | 151.5 | 154.2 | 154.4 | 62.6 |
query | 20 | 1 | 11 | 176.5 | 174.3 | 188.5 | 188.8 | 62.2 |
query | 20 | 1 | 13 | 208.2 | 205.8 | 222.2 | 222.4 | 62.3 |
query | 20 | 1 | 15 | 239.0 | 239.5 | 256.2 | 256.6 | 62.7 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 693.1 | 702.0 | 719.3 | 721.3 | 92.0 |
passage | 300 | 64 | 3 | 1674.4 | 1687.0 | 1849.1 | 2192.7 | 114.0 |
passage | 300 | 64 | 5 | 2780.4 | 2797.6 | 3082.0 | 3389.9 | 113.8 |
passage | 512 | 64 | 1 | 930.9 | 931.5 | 935.9 | 936.9 | 68.6 |
passage | 512 | 64 | 3 | 2398.1 | 2395.9 | 2403.8 | 2407.5 | 79.6 |
passage | 512 | 64 | 5 | 4056.4 | 4079.8 | 4098.5 | 4315.1 | 78.5 |
query | 20 | 1 | 1 | 19.8 | 19.7 | 20.6 | 20.7 | 50.1 |
query | 20 | 1 | 3 | 42.3 | 44.0 | 45.2 | 45.5 | 70.8 |
query | 20 | 1 | 5 | 70.1 | 73.4 | 75.1 | 75.8 | 71.1 |
query | 20 | 1 | 7 | 97.7 | 102.6 | 104.5 | 104.9 | 71.6 |
query | 20 | 1 | 9 | 124.9 | 131.3 | 134.2 | 134.8 | 71.9 |
query | 20 | 1 | 11 | 151.7 | 149.8 | 163.6 | 164.3 | 72.4 |
query | 20 | 1 | 13 | 180.3 | 178.8 | 193.3 | 194.0 | 72.0 |
query | 20 | 1 | 15 | 208.4 | 207.8 | 222.5 | 223.4 | 71.9 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 552.1 | 555.1 | 572.6 | 575.1 | 115.5 |
passage | 300 | 64 | 3 | 1228.5 | 1229.5 | 1325.3 | 1527.0 | 155.4 |
passage | 300 | 64 | 5 | 2045.3 | 2058.5 | 2153.8 | 2231.9 | 155.3 |
passage | 512 | 64 | 1 | 730.0 | 729.7 | 732.3 | 733.6 | 87.4 |
passage | 512 | 64 | 3 | 1775.8 | 1779.3 | 1784.0 | 1784.5 | 107.7 |
passage | 512 | 64 | 5 | 2945.8 | 2539.2 | 3431.0 | 3432.8 | 107.6 |
query | 20 | 1 | 1 | 14.6 | 14.6 | 15.2 | 15.4 | 67.9 |
query | 20 | 1 | 3 | 29.1 | 30.7 | 31.6 | 31.9 | 102.7 |
query | 20 | 1 | 5 | 48.7 | 51.4 | 52.6 | 52.9 | 102.3 |
query | 20 | 1 | 7 | 68.2 | 72.0 | 73.7 | 74.0 | 102.4 |
query | 20 | 1 | 9 | 86.7 | 90.2 | 94.0 | 94.6 | 103.7 |
query | 20 | 1 | 11 | 106.3 | 105.3 | 115.0 | 115.5 | 103.3 |
query | 20 | 1 | 13 | 125.3 | 125.0 | 134.9 | 135.8 | 103.6 |
query | 20 | 1 | 15 | 144.4 | 145.0 | 155.3 | 156.1 | 103.7 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 16 | 1 | 1753.4 | 1745.5 | 1831.3 | 1850.1 | 9.1 |
passage | 300 | 16 | 3 | 5090.3 | 5173.0 | 5259.7 | 5416.0 | 9.3 |
passage | 300 | 16 | 5 | 8349.1 | 8605.0 | 8750.8 | 8830.5 | 9.3 |
passage | 300 | 64 | 1 | 7270.1 | 7290.9 | 7380.2 | 7390.7 | 8.8 |
passage | 300 | 64 | 3 | 20045.6 | 21451.2 | 21691.0 | 21695.3 | 8.9 |
passage | 300 | 64 | 5 | 31066.4 | 35673.0 | 36053.9 | 36088.0 | 8.9 |
query | 20 | 1 | 1 | 66.4 | 66.2 | 67.1 | 67.2 | 15.0 |
query | 20 | 1 | 3 | 168.6 | 179.1 | 180.8 | 181.5 | 17.8 |
query | 20 | 1 | 5 | 278.9 | 298.7 | 300.6 | 300.9 | 17.9 |
query | 20 | 1 | 7 | 388.5 | 417.5 | 419.9 | 420.6 | 18.0 |
query | 20 | 1 | 9 | 501.5 | 535.8 | 539.7 | 540.5 | 17.9 |
query | 20 | 1 | 11 | 616.4 | 603.0 | 659.1 | 659.8 | 17.8 |
query | 20 | 1 | 13 | 728.6 | 722.0 | 778.9 | 779.8 | 17.8 |
query | 20 | 1 | 15 | 838.3 | 840.9 | 897.7 | 898.6 | 17.8 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 16 | 1 | 2927.9 | 2927.4 | 3059.5 | 3089.5 | 5.5 |
passage | 300 | 16 | 3 | 8379.6 | 8563.5 | 8739.8 | 9256.8 | 5.6 |
passage | 300 | 16 | 5 | 13629.2 | 14355.2 | 14642.3 | 14710.0 | 5.6 |
passage | 300 | 64 | 1 | 11385.6 | 11350.1 | 11646.0 | 11693.3 | 5.6 |
passage | 300 | 64 | 3 | 29783.1 | 33442.8 | 33609.1 | 33687.4 | 5.7 |
passage | 300 | 64 | 5 | 43320.7 | 55557.3 | 55833.8 | 55911.2 | 5.8 |
query | 20 | 1 | 1 | 39.8 | 39.7 | 40.2 | 40.4 | 25.1 |
query | 20 | 1 | 3 | 95.5 | 100.7 | 101.4 | 101.7 | 31.4 |
query | 20 | 1 | 5 | 157.1 | 167.9 | 168.8 | 169.0 | 31.8 |
query | 20 | 1 | 7 | 224.1 | 235.1 | 236.2 | 236.5 | 31.2 |
query | 20 | 1 | 9 | 284.4 | 302.1 | 303.6 | 303.9 | 31.6 |
query | 20 | 1 | 11 | 345.7 | 339.8 | 370.6 | 370.9 | 31.8 |
query | 20 | 1 | 13 | 410.6 | 406.0 | 437.9 | 438.2 | 31.6 |
query | 20 | 1 | 15 | 470.4 | 472.7 | 505.1 | 505.6 | 31.8 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 1887.4 | 1890.5 | 1954.8 | 1970.8 | 33.9 |
passage | 300 | 64 | 3 | 5411.6 | 5587.9 | 5787.5 | 6121.0 | 34.9 |
passage | 300 | 64 | 5 | 8957.8 | 9469.9 | 11139.8 | 11486.4 | 34.6 |
passage | 512 | 64 | 1 | 2839.6 | 2851.3 | 2973.2 | 3008.1 | 22.5 |
passage | 512 | 64 | 3 | 8179.3 | 8529.3 | 8662.5 | 8678.7 | 23.0 |
passage | 512 | 64 | 5 | 13935.1 | 14520.8 | 14928.1 | 15156.9 | 22.5 |
query | 20 | 1 | 1 | 24.0 | 23.9 | 24.2 | 24.3 | 41.6 |
query | 20 | 1 | 3 | 51.3 | 54.4 | 55.3 | 55.5 | 58.4 |
query | 20 | 1 | 5 | 87.2 | 91.3 | 92.8 | 93.1 | 57.3 |
query | 20 | 1 | 7 | 120.8 | 126.9 | 129.5 | 129.8 | 57.9 |
query | 20 | 1 | 9 | 154.8 | 162.4 | 166.6 | 166.9 | 58.1 |
query | 20 | 1 | 11 | 187.7 | 185.9 | 203.5 | 203.8 | 58.5 |
query | 20 | 1 | 13 | 223.0 | 222.1 | 239.8 | 240.3 | 58.2 |
query | 20 | 1 | 15 | 256.2 | 258.2 | 276.6 | 277.3 | 58.5 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 1618.5 | 1619.8 | 1666.8 | 1700.4 | 39.5 |
passage | 300 | 64 | 3 | 4245.1 | 4279.2 | 4465.5 | 5715.4 | 44.6 |
passage | 300 | 64 | 5 | 7009.0 | 7159.7 | 7465.8 | 8631.8 | 44.6 |
passage | 512 | 64 | 1 | 2316.9 | 2317.3 | 2321.5 | 2323.1 | 27.6 |
passage | 512 | 64 | 3 | 6328.9 | 6407.9 | 6414.3 | 6415.5 | 29.9 |
passage | 512 | 64 | 5 | 10559.8 | 10698.6 | 11012.5 | 11124.2 | 29.7 |
query | 20 | 1 | 1 | 22.5 | 22.5 | 22.8 | 22.9 | 44.4 |
query | 20 | 1 | 3 | 49.5 | 53.2 | 53.6 | 53.8 | 60.6 |
query | 20 | 1 | 5 | 81.2 | 88.5 | 89.1 | 89.2 | 61.6 |
query | 20 | 1 | 7 | 114.8 | 123.9 | 124.5 | 124.7 | 60.9 |
query | 20 | 1 | 9 | 147.6 | 145.4 | 160.0 | 160.1 | 60.9 |
query | 20 | 1 | 11 | 179.3 | 177.9 | 195.4 | 195.6 | 61.3 |
query | 20 | 1 | 13 | 212.8 | 213.6 | 231.3 | 231.5 | 61.0 |
query | 20 | 1 | 15 | 243.0 | 248.4 | 266.5 | 266.7 | 61.7 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 489.9 | 488.4 | 519.3 | 521.3 | 130.6 |
passage | 300 | 64 | 3 | 1355.4 | 1354.3 | 1413.0 | 1423.2 | 141.0 |
passage | 300 | 64 | 5 | 2251.5 | 2271.3 | 2338.8 | 2345.4 | 140.9 |
passage | 512 | 64 | 1 | 641.5 | 640.7 | 647.5 | 648.6 | 99.7 |
passage | 512 | 64 | 3 | 1797.2 | 1807.8 | 1813.7 | 1814.9 | 106.2 |
passage | 512 | 64 | 5 | 2979.6 | 3014.9 | 3020.7 | 3021.9 | 106.2 |
query | 20 | 1 | 1 | 7.9 | 7.9 | 8.2 | 8.4 | 125.6 |
query | 20 | 1 | 3 | 11.9 | 12.3 | 12.6 | 12.7 | 251.4 |
query | 20 | 1 | 5 | 20.0 | 20.6 | 20.9 | 20.9 | 249.5 |
query | 20 | 1 | 7 | 27.7 | 28.9 | 29.4 | 29.5 | 251.5 |
query | 20 | 1 | 9 | 35.6 | 37.0 | 37.6 | 37.8 | 252.2 |
query | 20 | 1 | 11 | 43.6 | 45.3 | 45.9 | 46.1 | 251.9 |
query | 20 | 1 | 13 | 51.5 | 53.3 | 54.2 | 54.4 | 252.2 |
query | 20 | 1 | 15 | 59.6 | 59.3 | 63.0 | 63.3 | 251.0 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 373.3 | 366.4 | 397.5 | 401.4 | 171.3 |
passage | 300 | 64 | 3 | 918.5 | 919.8 | 958.2 | 963.8 | 208.4 |
passage | 300 | 64 | 5 | 1527.0 | 1534.3 | 1589.6 | 1594.6 | 208.4 |
passage | 512 | 64 | 1 | 470.4 | 469.4 | 475.0 | 476.0 | 136.0 |
passage | 512 | 64 | 3 | 1180.7 | 1184.0 | 1184.5 | 1184.7 | 162.0 |
passage | 512 | 64 | 5 | 1960.7 | 1973.6 | 1974.0 | 1974.2 | 162.0 |
query | 20 | 1 | 1 | 10.7 | 10.7 | 11.0 | 11.0 | 93.2 |
query | 20 | 1 | 3 | 19.6 | 20.4 | 20.9 | 21.0 | 152.6 |
query | 20 | 1 | 5 | 32.2 | 34.2 | 34.9 | 35.3 | 154.9 |
query | 20 | 1 | 7 | 45.9 | 48.0 | 48.8 | 49.1 | 152.2 |
query | 20 | 1 | 9 | 59.5 | 62.0 | 63.0 | 63.2 | 151.1 |
query | 20 | 1 | 11 | 72.7 | 76.0 | 77.3 | 77.8 | 151.1 |
query | 20 | 1 | 13 | 85.7 | 89.0 | 90.7 | 90.9 | 151.6 |
query | 20 | 1 | 15 | 99.9 | 103.5 | 104.4 | 104.7 | 149.9 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 189.4 | 187.6 | 203.0 | 204.9 | 337.5 |
passage | 300 | 64 | 3 | 371.2 | 372.3 | 396.3 | 404.4 | 516.2 |
passage | 300 | 64 | 5 | 621.1 | 622.6 | 655.5 | 677.5 | 513.7 |
passage | 512 | 64 | 1 | 228.0 | 227.0 | 233.5 | 234.5 | 280.4 |
passage | 512 | 64 | 3 | 462.9 | 467.3 | 559.3 | 570.4 | 414.1 |
passage | 512 | 64 | 5 | 840.0 | 807.1 | 1040.2 | 1089.9 | 379.9 |
query | 20 | 1 | 1 | 6.6 | 6.6 | 7.0 | 7.2 | 150.7 |
query | 20 | 1 | 3 | 7.4 | 7.4 | 7.5 | 7.6 | 399.7 |
query | 20 | 1 | 5 | 12.1 | 12.5 | 12.7 | 12.8 | 411.2 |
query | 20 | 1 | 7 | 16.8 | 17.4 | 17.8 | 17.9 | 413.2 |
query | 20 | 1 | 9 | 21.7 | 22.4 | 22.9 | 23.0 | 413.7 |
query | 20 | 1 | 11 | 26.3 | 27.3 | 27.6 | 27.7 | 417.1 |
query | 20 | 1 | 13 | 31.1 | 32.0 | 32.6 | 32.7 | 416.9 |
query | 20 | 1 | 15 | 36.4 | 37.3 | 37.7 | 37.8 | 411.0 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 172.1 | 171.9 | 186.3 | 189.1 | 371.4 |
passage | 300 | 64 | 3 | 334.3 | 335.5 | 363.5 | 383.4 | 573.4 |
passage | 300 | 64 | 5 | 556.5 | 557.5 | 585.4 | 600.3 | 573.6 |
passage | 512 | 64 | 1 | 203.5 | 202.6 | 206.5 | 207.2 | 314.2 |
passage | 512 | 64 | 3 | 406.1 | 406.7 | 497.6 | 502.4 | 472.0 |
passage | 512 | 64 | 5 | 673.6 | 673.2 | 718.2 | 760.0 | 474.1 |
query | 20 | 1 | 1 | 5.3 | 5.2 | 5.6 | 5.7 | 188.6 |
query | 20 | 1 | 3 | 7.3 | 7.4 | 7.5 | 7.5 | 408.6 |
query | 20 | 1 | 5 | 11.9 | 12.3 | 12.5 | 12.5 | 417.7 |
query | 20 | 1 | 7 | 16.5 | 17.2 | 17.4 | 17.5 | 423.6 |
query | 20 | 1 | 9 | 21.2 | 22.1 | 22.3 | 22.4 | 424.4 |
query | 20 | 1 | 11 | 25.9 | 27.0 | 27.3 | 27.4 | 423.7 |
query | 20 | 1 | 13 | 30.8 | 31.9 | 32.4 | 32.5 | 421.9 |
query | 20 | 1 | 15 | 35.2 | 34.9 | 37.1 | 37.2 | 425.5 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 2586.4 | 2617.0 | 2802.0 | 2803.0 | 24.7 |
passage | 300 | 64 | 3 | 7438.4 | 7622.4 | 7961.6 | 8037.8 | 25.2 |
passage | 300 | 64 | 5 | 12158.7 | 12724.5 | 13256.9 | 13319.0 | 25.1 |
passage | 512 | 64 | 1 | 3727.8 | 3727.5 | 3728.9 | 3729.3 | 17.2 |
passage | 512 | 64 | 3 | 10810.7 | 11063.3 | 11102.2 | 11154.1 | 17.3 |
passage | 512 | 64 | 5 | 17458.0 | 14878.2 | 22157.8 | 22183.8 | 17.3 |
query | 20 | 1 | 1 | 32.3 | 32.2 | 32.6 | 32.7 | 30.8 |
query | 20 | 1 | 3 | 81.1 | 85.5 | 85.9 | 86.0 | 36.9 |
query | 20 | 1 | 5 | 136.5 | 142.8 | 143.1 | 143.3 | 36.6 |
query | 20 | 1 | 7 | 189.4 | 199.9 | 200.4 | 200.5 | 36.9 |
query | 20 | 1 | 9 | 245.6 | 257.0 | 257.6 | 257.8 | 36.6 |
query | 20 | 1 | 11 | 297.5 | 313.4 | 314.5 | 314.7 | 36.9 |
query | 20 | 1 | 13 | 350.9 | 344.2 | 371.6 | 371.8 | 37.0 |
query | 20 | 1 | 15 | 409.1 | 427.2 | 429.0 | 429.3 | 36.6 |
Input Type |
Input Tokens |
Batch Size |
Concurrency |
Avg Latency |
P50 Latency |
P90 Latency |
P95 Latency |
Throughput (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage | 300 | 64 | 1 | 98.0 | 95.0 | 105.8 | 106.6 | 651.7 |
passage | 300 | 64 | 3 | 144.9 | 142.3 | 159.3 | 174.9 | 1320.4 |
passage | 300 | 64 | 5 | 243.3 | 238.8 | 283.4 | 298.1 | 1311.5 |
passage | 512 | 64 | 1 | 112.0 | 112.0 | 112.9 | 113.2 | 569.2 |
passage | 512 | 64 | 3 | 223.2 | 253.4 | 257.1 | 257.7 | 857.7 |
passage | 512 | 64 | 5 | 300.7 | 295.7 | 356.5 | 360.0 | 1061.4 |
query | 20 | 1 | 1 | 4.6 | 4.6 | 4.8 | 4.8 | 215.6 |
query | 20 | 1 | 3 | 7.0 | 7.2 | 7.5 | 7.8 | 426.4 |
query | 20 | 1 | 5 | 11.4 | 11.9 | 12.1 | 12.2 | 434.7 |
query | 20 | 1 | 7 | 16.0 | 16.7 | 16.9 | 17.0 | 434.7 |
query | 20 | 1 | 9 | 20.6 | 21.4 | 21.8 | 21.9 | 435.3 |
query | 20 | 1 | 11 | 25.2 | 26.2 | 26.7 | 26.9 | 435.8 |
query | 20 | 1 | 13 | 30.2 | 31.2 | 31.8 | 32.1 | 429.8 |
query | 20 | 1 | 15 | 34.9 | 35.8 | 36.4 | 36.6 | 429.0 |