Performance for NVIDIA NeMo Retriever Embedding NIM#
To benchmark the performance of NVIDIA NeMo Retriever Embedding NIM,
you can use the genai-perf tool.
genai-perf is pre-installed in the Triton Server SDK container.
For the remainder of this section, we will use genai-perf==0.0.11 that comes packaged with the Triton Server SDK 25.02.
To run a performance benchmark, first create a dataset of text examples that
genai-perf can use when making requests to the embedding service. These examples should be representative of the type of data that you expect to receive in a production setting. The dataset should be formatted as a JSONL file where each line contains a
{"text": ...} object, as shown in the following example.
Example: (
embeddings.jsonl)
{"text": "What was the first car ever driven?"}
{"text": "Who served as the 5th President of the United States of America?"}
{"text": "Is the Sydney Opera House located in Australia?"}
{"text": "In what state did they film Shrek 2?"}
Use the following example to run the Triton Inference Server SDK docker container. Mount the directory where you created your JSONL file, which appears as
datasets/ in the following example.
export RELEASE="25.02"
docker run -it --rm \
--gpus=all \
--network="host" \
--mount type=bind,source=${PWD}/datasets,target=/datasets \
nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
Execute the following command to run a performance benchmark using the
genai-perf command line tool.
genai-perf profile \
-m nvidia/llama-nemotron-embed-1b-v2 \
--service-kind openai \
--endpoint-type embeddings \
--batch-size 2 \
--input-file /datasets/embeddings.jsonl \
--extra-inputs input_type:query \
--extra-inputs truncate:END \
--concurrency 5 \
--url http://localhost:8000
You can see the full set of command line options for
genai-perf in the Command Line Options section of the GenAI-Perf documentation.
Benchmarks#
All latency measurements are reported in milliseconds.
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
20
|
1
|
1
|
7.8
|
8.0
|
8.0
|
8.3
|
126.0
|
passage
|
20
|
1
|
3
|
8.3
|
8.0
|
9.0
|
8.8
|
348.0
|
passage
|
20
|
1
|
5
|
8.8
|
8.0
|
9.0
|
10.3
|
547.0
|
passage
|
20
|
1
|
7
|
10.8
|
11.0
|
12.0
|
11.9
|
585.0
|
passage
|
20
|
1
|
9
|
13.2
|
13.0
|
15.0
|
15.1
|
643.0
|
passage
|
20
|
1
|
11
|
15.5
|
16.0
|
18.0
|
18.4
|
652.0
|
passage
|
20
|
1
|
13
|
17.5
|
17.0
|
21.0
|
21.5
|
673.0
|
passage
|
20
|
1
|
15
|
20.6
|
21.0
|
24.0
|
24.0
|
662.0
|
passage
|
300
|
64
|
1
|
69.5
|
69.0
|
71.0
|
72.3
|
896.0
|
passage
|
300
|
64
|
3
|
103.5
|
104.0
|
107.0
|
108.3
|
1779.0
|
passage
|
300
|
64
|
5
|
172.3
|
173.0
|
178.0
|
179.9
|
1799.0
|
passage
|
512
|
64
|
1
|
103.6
|
101.0
|
116.0
|
120.1
|
606.0
|
passage
|
512
|
64
|
3
|
180.6
|
182.0
|
192.0
|
193.6
|
1040.0
|
passage
|
512
|
64
|
5
|
298.6
|
300.0
|
318.0
|
325.1
|
1041.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
20
|
1
|
1
|
8.0
|
8.0
|
9.0
|
8.9
|
122.0
|
passage
|
20
|
1
|
3
|
8.6
|
8.0
|
9.0
|
9.2
|
340.0
|
passage
|
20
|
1
|
5
|
8.8
|
9.0
|
9.0
|
9.2
|
548.0
|
passage
|
20
|
1
|
7
|
10.9
|
11.0
|
12.0
|
12.6
|
575.0
|
passage
|
20
|
1
|
9
|
13.3
|
14.0
|
15.0
|
15.2
|
638.0
|
passage
|
20
|
1
|
11
|
15.4
|
16.0
|
17.0
|
17.9
|
668.0
|
passage
|
20
|
1
|
13
|
17.6
|
17.0
|
21.0
|
21.3
|
688.0
|
passage
|
20
|
1
|
15
|
21.2
|
22.0
|
24.0
|
24.9
|
645.0
|
passage
|
300
|
64
|
1
|
70.9
|
71.0
|
73.0
|
74.3
|
880.0
|
passage
|
300
|
64
|
3
|
112.8
|
113.0
|
117.0
|
118.4
|
1652.0
|
passage
|
300
|
64
|
5
|
187.7
|
189.0
|
194.0
|
194.9
|
1653.0
|
passage
|
512
|
64
|
1
|
109.5
|
108.0
|
128.0
|
128.9
|
574.0
|
passage
|
512
|
64
|
3
|
202.5
|
203.0
|
216.0
|
219.4
|
919.0
|
passage
|
512
|
64
|
5
|
336.4
|
338.0
|
352.0
|
356.9
|
926.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
20
|
1
|
1
|
8.2
|
8.0
|
9.0
|
8.6
|
120.0
|
passage
|
20
|
1
|
3
|
8.5
|
8.0
|
9.0
|
9.2
|
345.0
|
passage
|
20
|
1
|
5
|
9.4
|
9.0
|
10.0
|
10.8
|
508.0
|
passage
|
20
|
1
|
7
|
11.4
|
11.0
|
13.0
|
13.4
|
576.0
|
passage
|
20
|
1
|
9
|
14.4
|
15.0
|
17.0
|
16.9
|
584.0
|
passage
|
20
|
1
|
11
|
16.6
|
16.0
|
20.0
|
20.2
|
602.0
|
passage
|
20
|
1
|
13
|
20.2
|
21.0
|
24.0
|
24.2
|
590.0
|
passage
|
20
|
1
|
15
|
22.6
|
23.0
|
26.0
|
26.7
|
600.0
|
passage
|
300
|
64
|
1
|
68.8
|
68.0
|
72.0
|
73.1
|
907.0
|
passage
|
300
|
64
|
3
|
119.3
|
120.0
|
125.0
|
126.4
|
1552.0
|
passage
|
300
|
64
|
5
|
199.8
|
202.0
|
208.0
|
210.3
|
1555.0
|
passage
|
512
|
64
|
1
|
112.7
|
112.0
|
125.0
|
131.9
|
559.0
|
passage
|
512
|
64
|
3
|
229.5
|
232.0
|
242.0
|
246.6
|
814.0
|
passage
|
512
|
64
|
5
|
383.5
|
391.0
|
401.0
|
407.0
|
813.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
20
|
1
|
1
|
8.2
|
8.0
|
8.0
|
8.7
|
120.0
|
passage
|
20
|
1
|
3
|
8.3
|
8.0
|
9.0
|
8.8
|
339.0
|
passage
|
20
|
1
|
5
|
8.7
|
9.0
|
9.0
|
9.3
|
535.0
|
passage
|
20
|
1
|
7
|
10.7
|
11.0
|
12.0
|
11.8
|
597.0
|
passage
|
20
|
1
|
9
|
13.2
|
13.0
|
15.0
|
15.2
|
642.0
|
passage
|
20
|
1
|
11
|
15.5
|
16.0
|
18.0
|
18.3
|
654.0
|
passage
|
20
|
1
|
13
|
18.4
|
19.0
|
21.0
|
21.4
|
650.0
|
passage
|
20
|
1
|
15
|
20.8
|
21.0
|
24.0
|
24.5
|
668.0
|
passage
|
300
|
64
|
1
|
78.7
|
79.0
|
82.0
|
82.8
|
795.0
|
passage
|
300
|
64
|
3
|
149.7
|
151.0
|
156.0
|
157.8
|
1250.0
|
passage
|
300
|
64
|
5
|
249.8
|
254.0
|
260.0
|
261.9
|
1246.0
|
passage
|
512
|
64
|
1
|
129.6
|
127.0
|
142.0
|
150.2
|
487.0
|
passage
|
512
|
64
|
3
|
283.1
|
286.0
|
299.0
|
302.8
|
660.0
|
passage
|
512
|
64
|
5
|
474.0
|
482.0
|
499.0
|
502.4
|
658.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
20
|
1
|
1
|
8.3
|
8.0
|
8.0
|
8.7
|
118.0
|
passage
|
20
|
1
|
3
|
9.0
|
9.0
|
9.0
|
9.5
|
320.0
|
passage
|
20
|
1
|
5
|
9.9
|
10.0
|
11.0
|
11.4
|
476.0
|
passage
|
20
|
1
|
7
|
11.5
|
12.0
|
14.0
|
13.7
|
573.0
|
passage
|
20
|
1
|
9
|
14.4
|
14.0
|
16.0
|
16.5
|
583.0
|
passage
|
20
|
1
|
11
|
17.8
|
18.0
|
21.0
|
21.7
|
518.0
|
passage
|
20
|
1
|
13
|
20.2
|
20.0
|
24.0
|
23.7
|
582.0
|
passage
|
20
|
1
|
15
|
20.5
|
21.0
|
24.0
|
24.7
|
664.0
|
passage
|
300
|
64
|
1
|
97.3
|
97.0
|
99.0
|
100.2
|
646.0
|
passage
|
300
|
64
|
3
|
200.0
|
202.0
|
204.0
|
204.9
|
940.0
|
passage
|
300
|
64
|
5
|
332.7
|
336.0
|
340.0
|
340.5
|
938.0
|
passage
|
512
|
64
|
1
|
199.9
|
189.0
|
261.0
|
268.1
|
317.0
|
passage
|
512
|
64
|
3
|
415.9
|
419.0
|
427.0
|
428.2
|
455.0
|
passage
|
512
|
64
|
5
|
690.1
|
701.0
|
711.0
|
712.8
|
453.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
20
|
1
|
1
|
8.6
|
9.0
|
9.0
|
9.1
|
114.0
|
passage
|
20
|
1
|
3
|
9.3
|
9.0
|
10.0
|
10.2
|
312.0
|
passage
|
20
|
1
|
5
|
10.5
|
10.0
|
11.0
|
11.2
|
459.0
|
passage
|
20
|
1
|
7
|
14.4
|
14.0
|
15.0
|
21.8
|
453.0
|
passage
|
20
|
1
|
9
|
17.6
|
18.0
|
20.0
|
20.2
|
476.0
|
passage
|
20
|
1
|
11
|
20.2
|
21.0
|
25.0
|
24.9
|
499.0
|
passage
|
20
|
1
|
13
|
24.5
|
24.0
|
27.0
|
28.4
|
480.0
|
passage
|
20
|
1
|
15
|
28.7
|
30.0
|
33.0
|
33.3
|
471.0
|
passage
|
300
|
64
|
1
|
124.1
|
124.0
|
126.0
|
127.2
|
508.0
|
passage
|
300
|
64
|
3
|
273.1
|
275.0
|
278.0
|
279.2
|
692.0
|
passage
|
300
|
64
|
5
|
451.2
|
458.0
|
463.0
|
463.7
|
692.0
|
passage
|
512
|
64
|
1
|
239.6
|
230.0
|
297.0
|
301.0
|
265.0
|
passage
|
512
|
64
|
3
|
529.9
|
534.0
|
541.0
|
542.7
|
357.0
|
passage
|
512
|
64
|
5
|
876.4
|
892.0
|
900.0
|
901.8
|
357.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
20
|
1
|
1
|
5.5
|
5.0
|
7.0
|
7.0
|
180.0
|
passage
|
20
|
1
|
3
|
7.0
|
7.0
|
7.0
|
7.1
|
421.0
|
passage
|
20
|
1
|
5
|
10.5
|
11.0
|
12.0
|
11.8
|
455.0
|
passage
|
20
|
1
|
7
|
12.6
|
12.0
|
16.0
|
16.4
|
529.0
|
passage
|
20
|
1
|
9
|
18.5
|
20.0
|
21.0
|
21.3
|
466.0
|
passage
|
20
|
1
|
11
|
19.6
|
19.0
|
23.0
|
25.2
|
524.0
|
passage
|
20
|
1
|
13
|
24.6
|
25.0
|
28.0
|
30.1
|
481.0
|
passage
|
20
|
1
|
15
|
26.4
|
28.0
|
30.0
|
30.3
|
509.0
|
passage
|
300
|
64
|
1
|
115.7
|
115.0
|
120.0
|
120.4
|
544.0
|
passage
|
300
|
64
|
3
|
229.7
|
231.0
|
236.0
|
239.2
|
817.0
|
passage
|
300
|
64
|
5
|
381.6
|
385.0
|
393.0
|
397.6
|
817.0
|
passage
|
512
|
64
|
1
|
199.6
|
195.0
|
237.0
|
242.4
|
317.0
|
passage
|
512
|
64
|
3
|
431.6
|
433.0
|
447.0
|
448.9
|
438.0
|
passage
|
512
|
64
|
5
|
715.0
|
724.0
|
740.0
|
746.6
|
437.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
20
|
1
|
1
|
9.6
|
9.0
|
10.0
|
11.0
|
103.0
|
passage
|
20
|
1
|
3
|
12.6
|
13.0
|
14.0
|
14.2
|
232.0
|
passage
|
20
|
1
|
5
|
20.6
|
22.0
|
23.0
|
23.1
|
237.0
|
passage
|
20
|
1
|
7
|
28.3
|
31.0
|
32.0
|
31.7
|
234.0
|
passage
|
20
|
1
|
9
|
35.1
|
36.0
|
41.0
|
41.8
|
244.0
|
passage
|
20
|
1
|
11
|
43.3
|
45.0
|
50.0
|
50.4
|
236.0
|
passage
|
20
|
1
|
13
|
52.6
|
54.0
|
59.0
|
58.8
|
234.0
|
passage
|
20
|
1
|
15
|
58.0
|
60.0
|
68.0
|
68.7
|
234.0
|
passage
|
300
|
64
|
1
|
304.9
|
305.0
|
309.0
|
310.3
|
209.0
|
passage
|
300
|
64
|
3
|
780.6
|
791.0
|
799.0
|
800.0
|
241.0
|
passage
|
300
|
64
|
5
|
1296.8
|
1320.0
|
1330.0
|
1331.8
|
241.0
|
passage
|
512
|
64
|
1
|
520.7
|
519.0
|
533.0
|
538.6
|
122.0
|
passage
|
512
|
64
|
3
|
1385.5
|
1404.0
|
1424.0
|
1428.4
|
137.0
|
passage
|
512
|
64
|
5
|
2294.3
|
2341.0
|
2362.0
|
2368.3
|
137.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
20
|
1
|
1
|
8.8
|
8.0
|
10.0
|
10.0
|
113.0
|
passage
|
20
|
1
|
3
|
11.1
|
11.0
|
12.0
|
11.8
|
259.0
|
passage
|
20
|
1
|
5
|
16.4
|
16.0
|
19.0
|
18.9
|
295.0
|
passage
|
20
|
1
|
7
|
23.3
|
22.0
|
26.0
|
26.8
|
290.0
|
passage
|
20
|
1
|
9
|
29.5
|
30.0
|
33.0
|
33.9
|
289.0
|
passage
|
20
|
1
|
11
|
34.2
|
34.0
|
40.0
|
40.6
|
302.0
|
passage
|
20
|
1
|
13
|
41.6
|
44.0
|
47.0
|
48.1
|
288.0
|
passage
|
20
|
1
|
15
|
45.7
|
48.0
|
55.0
|
55.2
|
300.0
|
passage
|
300
|
64
|
1
|
339.6
|
339.0
|
343.0
|
345.5
|
187.0
|
passage
|
300
|
64
|
3
|
918.3
|
927.0
|
933.0
|
934.6
|
207.0
|
passage
|
300
|
64
|
5
|
1517.3
|
1547.0
|
1554.0
|
1555.5
|
206.0
|
passage
|
512
|
64
|
1
|
642.3
|
643.0
|
657.0
|
661.9
|
99.0
|
passage
|
512
|
64
|
3
|
1795.3
|
1818.0
|
1835.0
|
1841.2
|
105.0
|
passage
|
512
|
64
|
5
|
2976.6
|
3034.0
|
3066.0
|
3075.5
|
105.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
query
|
20
|
1
|
1
|
7.0
|
7.0
|
8.0
|
8.0
|
140.7
|
query
|
20
|
1
|
3
|
10.0
|
10.0
|
10.0
|
10.0
|
291.7
|
query
|
20
|
1
|
5
|
16.0
|
17.0
|
17.0
|
17.0
|
307.0
|
query
|
20
|
1
|
7
|
22.0
|
23.0
|
24.0
|
24.0
|
310.9
|
query
|
20
|
1
|
9
|
25.0
|
26.0
|
31.0
|
31.0
|
329.7
|
query
|
20
|
1
|
11
|
30.0
|
32.0
|
38.0
|
38.0
|
332.8
|
query
|
20
|
1
|
13
|
36.0
|
37.0
|
43.0
|
44.0
|
340.0
|
query
|
20
|
1
|
15
|
42.0
|
43.0
|
48.0
|
50.0
|
327.5
|
passage
|
300
|
64
|
1
|
159.0
|
159.0
|
162.0
|
163.0
|
401.8
|
passage
|
300
|
64
|
3
|
267.0
|
269.0
|
275.0
|
277.0
|
709.4
|
passage
|
300
|
64
|
5
|
392.0
|
390.0
|
401.0
|
411.0
|
814.5
|
passage
|
512
|
64
|
1
|
222.0
|
218.0
|
233.0
|
235.0
|
286.3
|
passage
|
512
|
64
|
3
|
431.0
|
430.0
|
450.0
|
455.0
|
440.1
|
passage
|
512
|
64
|
5
|
615.0
|
617.0
|
694.0
|
701.0
|
504.3
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
query
|
20
|
1
|
1
|
8.0
|
8.0
|
8.0
|
9.0
|
125.6
|
query
|
20
|
1
|
3
|
12.0
|
12.0
|
13.0
|
13.0
|
246.1
|
query
|
20
|
1
|
5
|
20.0
|
21.0
|
22.0
|
22.0
|
247.9
|
query
|
20
|
1
|
7
|
25.0
|
26.0
|
30.0
|
30.0
|
267.5
|
query
|
20
|
1
|
9
|
34.0
|
34.0
|
38.0
|
39.0
|
251.5
|
query
|
20
|
1
|
11
|
43.0
|
46.0
|
47.0
|
48.0
|
237.7
|
query
|
20
|
1
|
13
|
46.0
|
49.0
|
53.0
|
54.0
|
260.9
|
query
|
20
|
1
|
15
|
55.0
|
58.0
|
65.0
|
65.0
|
248.4
|
passage
|
300
|
64
|
1
|
186.0
|
186.0
|
190.0
|
192.0
|
342.7
|
passage
|
300
|
64
|
3
|
345.0
|
346.0
|
354.0
|
356.0
|
550.7
|
passage
|
300
|
64
|
5
|
525.0
|
524.0
|
535.0
|
544.0
|
608.3
|
passage
|
512
|
64
|
1
|
269.0
|
265.0
|
280.0
|
284.0
|
237.2
|
passage
|
512
|
64
|
3
|
563.0
|
564.0
|
581.0
|
584.0
|
337.3
|
passage
|
512
|
64
|
5
|
846.0
|
861.0
|
946.0
|
956.0
|
366.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
query
|
20
|
1
|
1
|
10.0
|
10.0
|
10.0
|
11.0
|
98.0
|
query
|
20
|
1
|
3
|
15.0
|
17.0
|
17.0
|
17.0
|
192.5
|
query
|
20
|
1
|
5
|
24.0
|
23.0
|
28.0
|
28.0
|
200.8
|
query
|
20
|
1
|
7
|
34.0
|
34.0
|
39.0
|
40.0
|
197.0
|
query
|
20
|
1
|
9
|
44.0
|
45.0
|
51.0
|
51.0
|
197.3
|
query
|
20
|
1
|
11
|
53.0
|
56.0
|
62.0
|
62.0
|
196.8
|
query
|
20
|
1
|
13
|
63.0
|
67.0
|
73.0
|
73.0
|
190.6
|
query
|
20
|
1
|
15
|
73.0
|
79.0
|
84.0
|
84.0
|
188.3
|
passage
|
300
|
64
|
1
|
277.0
|
278.0
|
280.0
|
281.0
|
230.3
|
passage
|
300
|
64
|
3
|
615.0
|
617.0
|
629.0
|
632.0
|
309.4
|
passage
|
300
|
64
|
5
|
976.0
|
975.0
|
983.0
|
987.0
|
327.4
|
passage
|
512
|
64
|
1
|
443.0
|
441.0
|
448.0
|
454.0
|
144.3
|
passage
|
512
|
64
|
3
|
1071.0
|
1077.0
|
1089.0
|
1090.0
|
177.6
|
passage
|
512
|
64
|
5
|
1736.0
|
1735.0
|
1752.0
|
1758.0
|
184.1
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
query
|
20
|
1
|
1
|
12.0
|
12.0
|
12.0
|
12.0
|
85.7
|
query
|
20
|
1
|
3
|
19.0
|
21.0
|
22.0
|
22.0
|
150.3
|
query
|
20
|
1
|
5
|
32.0
|
29.0
|
36.0
|
36.0
|
152.8
|
query
|
20
|
1
|
7
|
46.0
|
50.0
|
50.0
|
50.0
|
148.0
|
query
|
20
|
1
|
9
|
55.0
|
57.0
|
65.0
|
65.0
|
156.3
|
query
|
20
|
1
|
11
|
68.0
|
72.0
|
79.0
|
79.0
|
154.5
|
query
|
20
|
1
|
13
|
79.0
|
86.0
|
93.0
|
93.0
|
154.4
|
query
|
20
|
1
|
15
|
84.0
|
86.0
|
95.0
|
101.0
|
165.0
|
passage
|
300
|
64
|
1
|
350.0
|
350.0
|
353.0
|
354.0
|
182.8
|
passage
|
300
|
64
|
3
|
821.0
|
823.0
|
831.0
|
842.0
|
231.7
|
passage
|
300
|
64
|
5
|
1320.0
|
1317.0
|
1330.0
|
1346.0
|
242.1
|
passage
|
512
|
64
|
1
|
567.0
|
566.0
|
573.0
|
575.0
|
112.7
|
passage
|
512
|
64
|
3
|
1440.0
|
1448.0
|
1464.0
|
1475.0
|
132.2
|
passage
|
512
|
64
|
5
|
2370.0
|
2371.0
|
2391.0
|
2396.0
|
134.8
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
query
|
20
|
1
|
1
|
8.0
|
8.0
|
9.0
|
10.0
|
122.3
|
query
|
20
|
1
|
3
|
11.0
|
12.0
|
14.0
|
14.0
|
252.9
|
query
|
20
|
1
|
5
|
20.0
|
19.0
|
23.0
|
24.0
|
250.9
|
query
|
20
|
1
|
7
|
27.0
|
27.0
|
31.0
|
32.0
|
251.6
|
query
|
20
|
1
|
9
|
30.0
|
30.0
|
37.0
|
42.0
|
280.1
|
query
|
20
|
1
|
11
|
41.0
|
42.0
|
47.0
|
51.0
|
250.9
|
query
|
20
|
1
|
13
|
46.0
|
49.0
|
53.0
|
57.0
|
262.0
|
query
|
20
|
1
|
15
|
50.0
|
51.0
|
60.0
|
63.0
|
269.7
|
passage
|
300
|
64
|
1
|
325.0
|
324.0
|
331.0
|
333.0
|
196.9
|
passage
|
300
|
64
|
3
|
699.0
|
699.0
|
717.0
|
723.0
|
272.1
|
passage
|
300
|
64
|
5
|
1095.0
|
1093.0
|
1109.0
|
1113.0
|
291.8
|
passage
|
512
|
64
|
1
|
496.0
|
497.0
|
502.0
|
518.0
|
128.8
|
passage
|
512
|
64
|
3
|
1165.0
|
1169.0
|
1199.0
|
1209.0
|
163.2
|
passage
|
512
|
64
|
5
|
1872.0
|
1870.0
|
1904.0
|
1916.0
|
170.5
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
query
|
20
|
1
|
1
|
10.0
|
9.0
|
10.0
|
10.0
|
104.7
|
query
|
20
|
1
|
3
|
12.0
|
12.0
|
12.0
|
12.0
|
246.7
|
query
|
20
|
1
|
5
|
18.0
|
19.0
|
20.0
|
20.0
|
276.7
|
query
|
20
|
1
|
7
|
25.0
|
24.0
|
28.0
|
28.0
|
268.9
|
query
|
20
|
1
|
9
|
30.0
|
31.0
|
36.0
|
36.0
|
291.4
|
query
|
20
|
1
|
11
|
38.0
|
39.0
|
44.0
|
44.0
|
270.3
|
query
|
20
|
1
|
13
|
43.0
|
44.0
|
52.0
|
52.0
|
272.4
|
query
|
20
|
1
|
15
|
49.0
|
52.0
|
56.0
|
57.0
|
276.4
|
passage
|
300
|
64
|
1
|
362.0
|
362.0
|
367.0
|
369.0
|
176.8
|
passage
|
300
|
64
|
3
|
786.0
|
789.0
|
803.0
|
812.0
|
241.9
|
passage
|
300
|
64
|
5
|
1240.0
|
1239.0
|
1258.0
|
1262.0
|
257.6
|
passage
|
512
|
64
|
1
|
550.0
|
548.0
|
565.0
|
570.0
|
116.2
|
passage
|
512
|
64
|
3
|
1327.0
|
1335.0
|
1354.0
|
1359.0
|
143.4
|
passage
|
512
|
64
|
5
|
2145.0
|
2145.0
|
2180.0
|
2187.0
|
149.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
query
|
20
|
1
|
1
|
21.0
|
21.0
|
21.0
|
21.0
|
48.6
|
query
|
20
|
1
|
3
|
39.0
|
44.0
|
45.0
|
45.0
|
75.8
|
query
|
20
|
1
|
5
|
61.0
|
60.0
|
75.0
|
75.0
|
80.2
|
query
|
20
|
1
|
7
|
85.0
|
89.0
|
104.0
|
105.0
|
79.2
|
query
|
20
|
1
|
9
|
109.0
|
119.0
|
134.0
|
134.0
|
79.0
|
query
|
20
|
1
|
11
|
144.0
|
149.0
|
164.0
|
164.0
|
73.2
|
query
|
20
|
1
|
13
|
159.0
|
164.0
|
194.0
|
194.0
|
76.5
|
query
|
20
|
1
|
15
|
175.0
|
179.0
|
209.0
|
209.0
|
79.4
|
passage
|
300
|
64
|
1
|
888.0
|
888.0
|
899.0
|
899.0
|
72.1
|
passage
|
300
|
64
|
3
|
2272.0
|
2280.0
|
2329.0
|
2341.0
|
83.9
|
passage
|
300
|
64
|
5
|
3795.0
|
3801.0
|
3828.0
|
3840.0
|
84.2
|
passage
|
512
|
64
|
1
|
1451.0
|
1451.0
|
1471.0
|
1473.0
|
44.1
|
passage
|
512
|
64
|
3
|
3926.0
|
3947.0
|
3988.0
|
4009.0
|
48.6
|
passage
|
512
|
64
|
5
|
6577.0
|
6571.0
|
6632.0
|
6657.0
|
48.6
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
query
|
20
|
1
|
1
|
13.0
|
13.0
|
14.0
|
15.0
|
74.2
|
query
|
20
|
1
|
3
|
21.0
|
23.0
|
23.0
|
23.0
|
138.0
|
query
|
20
|
1
|
5
|
32.0
|
31.0
|
39.0
|
39.0
|
152.2
|
query
|
20
|
1
|
7
|
46.0
|
46.0
|
53.0
|
54.0
|
145.8
|
query
|
20
|
1
|
9
|
56.0
|
61.0
|
69.0
|
69.0
|
150.3
|
query
|
20
|
1
|
11
|
67.0
|
69.0
|
76.0
|
77.0
|
154.0
|
query
|
20
|
1
|
13
|
86.0
|
91.0
|
98.0
|
99.0
|
141.2
|
query
|
20
|
1
|
15
|
98.0
|
105.0
|
113.0
|
114.0
|
141.0
|
passage
|
300
|
64
|
1
|
724.0
|
725.0
|
730.0
|
733.0
|
88.3
|
passage
|
300
|
64
|
3
|
1865.0
|
1876.0
|
1891.0
|
1892.0
|
102.1
|
passage
|
300
|
64
|
5
|
3117.0
|
3127.0
|
3147.0
|
3150.0
|
102.1
|
passage
|
512
|
64
|
1
|
1300.0
|
1300.0
|
1314.0
|
1318.0
|
49.2
|
passage
|
512
|
64
|
3
|
3551.0
|
3577.0
|
3606.0
|
3618.0
|
53.6
|
passage
|
512
|
64
|
5
|
5940.0
|
5968.0
|
6000.0
|
6007.0
|
53.6
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
query
|
20
|
1
|
1
|
18.0
|
18.0
|
18.0
|
19.0
|
55.8
|
query
|
20
|
1
|
3
|
33.0
|
37.0
|
38.0
|
38.0
|
87.7
|
query
|
20
|
1
|
5
|
55.0
|
62.0
|
63.0
|
63.0
|
87.4
|
query
|
20
|
1
|
7
|
71.0
|
75.0
|
88.0
|
88.0
|
94.6
|
query
|
20
|
1
|
9
|
96.0
|
101.0
|
114.0
|
114.0
|
90.3
|
query
|
20
|
1
|
11
|
115.0
|
126.0
|
139.0
|
139.0
|
89.8
|
query
|
20
|
1
|
13
|
131.0
|
138.0
|
164.0
|
164.0
|
91.4
|
query
|
20
|
1
|
15
|
154.0
|
157.0
|
189.0
|
189.0
|
88.4
|
passage
|
300
|
64
|
1
|
1011.0
|
1011.0
|
1019.0
|
1019.0
|
63.3
|
passage
|
300
|
64
|
3
|
2702.0
|
2716.0
|
2734.0
|
2754.0
|
70.6
|
passage
|
300
|
64
|
5
|
4527.0
|
4527.0
|
4550.0
|
4571.0
|
70.6
|
passage
|
512
|
64
|
1
|
1705.0
|
1710.0
|
1733.0
|
1740.0
|
37.5
|
passage
|
512
|
64
|
3
|
4770.0
|
4799.0
|
4860.0
|
4872.0
|
39.9
|
passage
|
512
|
64
|
5
|
8001.0
|
8025.0
|
8079.0
|
8085.0
|
39.9
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
99.8
|
100.9
|
107.9
|
108.6
|
639.0
|
passage
|
300
|
64
|
3
|
143.8
|
143.3
|
156.6
|
159.0
|
1330.0
|
passage
|
300
|
64
|
5
|
239.7
|
239.7
|
259.1
|
265.0
|
1331.0
|
passage
|
512
|
64
|
1
|
114.6
|
114.4
|
115.9
|
117.0
|
556.5
|
passage
|
512
|
64
|
3
|
170.2
|
169.9
|
171.2
|
171.8
|
1124.2
|
passage
|
512
|
64
|
5
|
284.6
|
284.5
|
285.6
|
286.1
|
1121.4
|
query
|
20
|
1
|
1
|
5.1
|
5.1
|
5.4
|
5.4
|
196.3
|
query
|
20
|
1
|
3
|
6.0
|
5.5
|
7.4
|
7.6
|
498.5
|
query
|
20
|
1
|
5
|
11.9
|
12.3
|
12.8
|
12.9
|
418.3
|
query
|
20
|
1
|
7
|
16.5
|
17.2
|
18.0
|
18.1
|
422.0
|
query
|
20
|
1
|
9
|
21.4
|
22.3
|
23.3
|
23.6
|
418.3
|
query
|
20
|
1
|
11
|
26.0
|
26.0
|
28.4
|
28.6
|
421.3
|
query
|
20
|
1
|
13
|
30.7
|
30.9
|
33.1
|
33.6
|
422.2
|
query
|
20
|
1
|
15
|
37.3
|
37.9
|
39.1
|
39.3
|
401.4
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
2554.3
|
2563.9
|
2678.1
|
2698.3
|
25.0
|
passage
|
300
|
64
|
3
|
7349.2
|
7502.1
|
7889.3
|
7968.1
|
25.5
|
passage
|
300
|
64
|
5
|
11913.2
|
12461.9
|
12893.4
|
12969.4
|
25.6
|
passage
|
512
|
64
|
1
|
3701.9
|
3701.6
|
3703.1
|
3703.4
|
17.3
|
passage
|
512
|
64
|
3
|
10730.2
|
10985.2
|
10987.0
|
11029.2
|
17.5
|
passage
|
512
|
64
|
5
|
17355.4
|
14691.3
|
22035.4
|
22035.7
|
17.4
|
query
|
20
|
1
|
1
|
32.4
|
32.4
|
32.7
|
32.8
|
30.7
|
query
|
20
|
1
|
3
|
82.5
|
85.6
|
85.9
|
86.0
|
36.3
|
query
|
20
|
1
|
5
|
135.5
|
142.9
|
143.3
|
143.3
|
36.8
|
query
|
20
|
1
|
7
|
191.7
|
200.2
|
200.5
|
200.6
|
36.5
|
query
|
20
|
1
|
9
|
246.9
|
257.4
|
257.8
|
257.9
|
36.4
|
query
|
20
|
1
|
11
|
301.7
|
314.6
|
315.1
|
315.2
|
36.4
|
query
|
20
|
1
|
13
|
356.6
|
371.6
|
372.2
|
372.4
|
36.4
|
query
|
20
|
1
|
15
|
409.5
|
401.4
|
429.8
|
429.9
|
36.5
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
176.5
|
177.1
|
188.6
|
190.4
|
362.2
|
passage
|
300
|
64
|
3
|
336.1
|
337.0
|
359.1
|
365.8
|
570.2
|
passage
|
300
|
64
|
5
|
560.2
|
562.9
|
592.3
|
634.6
|
569.8
|
passage
|
512
|
64
|
1
|
205.3
|
204.7
|
208.2
|
210.8
|
311.4
|
passage
|
512
|
64
|
3
|
410.9
|
411.1
|
412.5
|
412.7
|
466.4
|
passage
|
512
|
64
|
5
|
681.5
|
682.0
|
683.6
|
684.1
|
468.7
|
query
|
20
|
1
|
1
|
5.3
|
5.3
|
5.6
|
5.7
|
186.3
|
query
|
20
|
1
|
3
|
7.4
|
7.4
|
7.5
|
7.7
|
403.8
|
query
|
20
|
1
|
5
|
11.9
|
12.4
|
12.6
|
12.8
|
419.2
|
query
|
20
|
1
|
7
|
16.6
|
17.3
|
17.5
|
17.6
|
421.5
|
query
|
20
|
1
|
9
|
21.2
|
22.1
|
22.5
|
22.6
|
423.9
|
query
|
20
|
1
|
11
|
26.1
|
27.2
|
27.7
|
27.8
|
420.5
|
query
|
20
|
1
|
13
|
30.8
|
31.2
|
32.6
|
32.7
|
422.3
|
query
|
20
|
1
|
15
|
36.4
|
37.3
|
37.9
|
38.0
|
411.7
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
188.4
|
191.7
|
197.7
|
198.8
|
338.7
|
passage
|
300
|
64
|
3
|
371.7
|
372.7
|
393.8
|
471.0
|
515.3
|
passage
|
300
|
64
|
5
|
619.5
|
621.7
|
648.5
|
728.8
|
515.1
|
passage
|
512
|
64
|
1
|
222.7
|
222.3
|
226.0
|
227.4
|
286.5
|
passage
|
512
|
64
|
3
|
447.0
|
447.0
|
448.7
|
449.4
|
428.4
|
passage
|
512
|
64
|
5
|
742.3
|
742.8
|
745.0
|
745.5
|
430.1
|
query
|
20
|
1
|
1
|
6.6
|
6.6
|
7.0
|
7.1
|
149.3
|
query
|
20
|
1
|
3
|
7.4
|
7.3
|
7.6
|
7.7
|
404.8
|
query
|
20
|
1
|
5
|
11.8
|
12.2
|
12.5
|
12.6
|
421.5
|
query
|
20
|
1
|
7
|
16.4
|
17.1
|
17.4
|
17.5
|
426.4
|
query
|
20
|
1
|
9
|
20.9
|
21.9
|
22.3
|
22.4
|
429.9
|
query
|
20
|
1
|
11
|
25.7
|
26.8
|
27.4
|
27.7
|
427.2
|
query
|
20
|
1
|
13
|
30.4
|
31.5
|
32.1
|
32.2
|
427.4
|
query
|
20
|
1
|
15
|
35.6
|
36.4
|
37.9
|
38.0
|
420.6
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
377.9
|
376.4
|
396.5
|
404.5
|
169.1
|
passage
|
300
|
64
|
3
|
929.9
|
932.3
|
972.0
|
979.2
|
205.8
|
passage
|
300
|
64
|
5
|
1545.5
|
1555.2
|
1597.3
|
1610.3
|
205.9
|
passage
|
512
|
64
|
1
|
469.5
|
468.6
|
473.9
|
475.1
|
136.2
|
passage
|
512
|
64
|
3
|
1178.9
|
1182.4
|
1183.0
|
1183.2
|
162.2
|
passage
|
512
|
64
|
5
|
1958.1
|
1970.9
|
1971.6
|
1971.8
|
162.2
|
query
|
20
|
1
|
1
|
11.1
|
11.1
|
11.5
|
11.6
|
89.8
|
query
|
20
|
1
|
3
|
19.3
|
20.3
|
20.8
|
21.0
|
154.9
|
query
|
20
|
1
|
5
|
32.1
|
34.0
|
34.6
|
34.8
|
155.5
|
query
|
20
|
1
|
7
|
44.8
|
47.4
|
48.1
|
48.2
|
156.0
|
query
|
20
|
1
|
9
|
57.7
|
60.9
|
61.8
|
62.0
|
155.8
|
query
|
20
|
1
|
11
|
70.6
|
74.0
|
75.5
|
75.7
|
155.5
|
query
|
20
|
1
|
13
|
83.8
|
82.8
|
89.2
|
89.6
|
154.9
|
query
|
20
|
1
|
15
|
97.5
|
96.6
|
103.1
|
103.4
|
153.7
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
483.6
|
483.6
|
505.2
|
509.5
|
132.3
|
passage
|
300
|
64
|
3
|
1328.0
|
1334.4
|
1367.9
|
1379.6
|
143.9
|
passage
|
300
|
64
|
5
|
2181.8
|
2203.7
|
2241.9
|
2250.4
|
145.5
|
passage
|
512
|
64
|
1
|
633.8
|
633.8
|
638.6
|
639.4
|
100.9
|
passage
|
512
|
64
|
3
|
1744.9
|
1755.3
|
1761.5
|
1763.1
|
109.4
|
passage
|
512
|
64
|
5
|
2892.2
|
2923.9
|
2934.8
|
2936.8
|
109.4
|
query
|
20
|
1
|
1
|
8.0
|
8.0
|
8.3
|
8.3
|
124.1
|
query
|
20
|
1
|
3
|
11.2
|
12.2
|
12.6
|
12.8
|
266.1
|
query
|
20
|
1
|
5
|
19.9
|
20.6
|
21.1
|
21.2
|
250.3
|
query
|
20
|
1
|
7
|
27.6
|
28.9
|
29.4
|
29.6
|
253.0
|
query
|
20
|
1
|
9
|
35.1
|
36.7
|
37.3
|
37.5
|
256.1
|
query
|
20
|
1
|
11
|
42.7
|
44.6
|
45.5
|
45.7
|
256.9
|
query
|
20
|
1
|
13
|
50.7
|
50.3
|
54.0
|
54.2
|
255.9
|
query
|
20
|
1
|
15
|
57.4
|
57.9
|
62.2
|
62.5
|
261.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
552.1
|
555.1
|
572.6
|
575.1
|
115.5
|
passage
|
300
|
64
|
3
|
1228.5
|
1229.5
|
1325.3
|
1527.0
|
155.4
|
passage
|
300
|
64
|
5
|
2045.3
|
2058.5
|
2153.8
|
2231.9
|
155.3
|
passage
|
512
|
64
|
1
|
730.0
|
729.7
|
732.3
|
733.6
|
87.4
|
passage
|
512
|
64
|
3
|
1775.8
|
1779.3
|
1784.0
|
1784.5
|
107.7
|
passage
|
512
|
64
|
5
|
2945.8
|
2539.2
|
3431.0
|
3432.8
|
107.6
|
query
|
20
|
1
|
1
|
14.6
|
14.6
|
15.2
|
15.4
|
67.9
|
query
|
20
|
1
|
3
|
29.1
|
30.7
|
31.6
|
31.9
|
102.7
|
query
|
20
|
1
|
5
|
48.7
|
51.4
|
52.6
|
52.9
|
102.3
|
query
|
20
|
1
|
7
|
68.2
|
72.0
|
73.7
|
74.0
|
102.4
|
query
|
20
|
1
|
9
|
86.7
|
90.2
|
94.0
|
94.6
|
103.7
|
query
|
20
|
1
|
11
|
106.3
|
105.3
|
115.0
|
115.5
|
103.3
|
query
|
20
|
1
|
13
|
125.3
|
125.0
|
134.9
|
135.8
|
103.6
|
query
|
20
|
1
|
15
|
144.4
|
145.0
|
155.3
|
156.1
|
103.7
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
693.1
|
702.0
|
719.3
|
721.3
|
92.0
|
passage
|
300
|
64
|
3
|
1674.4
|
1687.0
|
1849.1
|
2192.7
|
114.0
|
passage
|
300
|
64
|
5
|
2780.4
|
2797.6
|
3082.0
|
3389.9
|
113.8
|
passage
|
512
|
64
|
1
|
930.9
|
931.5
|
935.9
|
936.9
|
68.6
|
passage
|
512
|
64
|
3
|
2398.1
|
2395.9
|
2403.8
|
2407.5
|
79.6
|
passage
|
512
|
64
|
5
|
4056.4
|
4079.8
|
4098.5
|
4315.1
|
78.5
|
query
|
20
|
1
|
1
|
19.8
|
19.7
|
20.6
|
20.7
|
50.1
|
query
|
20
|
1
|
3
|
42.3
|
44.0
|
45.2
|
45.5
|
70.8
|
query
|
20
|
1
|
5
|
70.1
|
73.4
|
75.1
|
75.8
|
71.1
|
query
|
20
|
1
|
7
|
97.7
|
102.6
|
104.5
|
104.9
|
71.6
|
query
|
20
|
1
|
9
|
124.9
|
131.3
|
134.2
|
134.8
|
71.9
|
query
|
20
|
1
|
11
|
151.7
|
149.8
|
163.6
|
164.3
|
72.4
|
query
|
20
|
1
|
13
|
180.3
|
178.8
|
193.3
|
194.0
|
72.0
|
query
|
20
|
1
|
15
|
208.4
|
207.8
|
222.5
|
223.4
|
71.9
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
1322.4
|
1322.2
|
1362.2
|
1369.5
|
48.3
|
passage
|
300
|
64
|
3
|
3670.5
|
3674.6
|
3798.4
|
3824.3
|
52.2
|
passage
|
300
|
64
|
5
|
6188.9
|
6219.1
|
6368.4
|
6378.3
|
51.4
|
passage
|
512
|
64
|
1
|
1990.0
|
1990.5
|
2013.8
|
2018.5
|
32.1
|
passage
|
512
|
64
|
3
|
5586.0
|
5601.7
|
5683.0
|
5689.6
|
34.3
|
passage
|
512
|
64
|
5
|
9358.7
|
9398.1
|
9525.5
|
9570.5
|
34.0
|
query
|
20
|
1
|
1
|
21.5
|
21.5
|
21.8
|
21.8
|
46.3
|
query
|
20
|
1
|
3
|
47.8
|
51.1
|
51.5
|
51.7
|
62.5
|
query
|
20
|
1
|
5
|
82.1
|
85.3
|
85.8
|
85.9
|
60.8
|
query
|
20
|
1
|
7
|
112.1
|
119.2
|
120.0
|
120.2
|
62.3
|
query
|
20
|
1
|
9
|
143.5
|
151.5
|
154.2
|
154.4
|
62.6
|
query
|
20
|
1
|
11
|
176.5
|
174.3
|
188.5
|
188.8
|
62.2
|
query
|
20
|
1
|
13
|
208.2
|
205.8
|
222.2
|
222.4
|
62.3
|
query
|
20
|
1
|
15
|
239.0
|
239.5
|
256.2
|
256.6
|
62.7
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
1954.6
|
1957.3
|
2010.7
|
2029.4
|
32.7
|
passage
|
300
|
64
|
3
|
5650.6
|
5734.4
|
5950.8
|
7649.8
|
33.3
|
passage
|
300
|
64
|
5
|
9470.5
|
9790.5
|
9947.0
|
10511.1
|
32.7
|
passage
|
512
|
64
|
1
|
3038.8
|
3045.9
|
3079.6
|
3080.5
|
21.0
|
passage
|
512
|
64
|
3
|
8659.0
|
8835.5
|
8944.2
|
8960.5
|
21.8
|
passage
|
512
|
64
|
5
|
14292.3
|
14782.0
|
14948.8
|
14986.1
|
21.6
|
query
|
20
|
1
|
1
|
29.3
|
29.2
|
29.5
|
29.6
|
34.0
|
query
|
20
|
1
|
3
|
71.2
|
73.1
|
73.3
|
73.4
|
42.0
|
query
|
20
|
1
|
5
|
113.8
|
121.7
|
122.2
|
122.3
|
43.9
|
query
|
20
|
1
|
7
|
159.3
|
170.2
|
171.0
|
171.1
|
43.9
|
query
|
20
|
1
|
9
|
204.7
|
217.6
|
219.9
|
220.0
|
43.9
|
query
|
20
|
1
|
11
|
253.3
|
266.7
|
268.8
|
268.9
|
43.4
|
query
|
20
|
1
|
13
|
299.2
|
295.0
|
317.5
|
317.7
|
43.4
|
query
|
20
|
1
|
15
|
346.4
|
342.2
|
366.3
|
366.4
|
43.2
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
1618.5
|
1619.8
|
1666.8
|
1700.4
|
39.5
|
passage
|
300
|
64
|
3
|
4245.1
|
4279.2
|
4465.5
|
5715.4
|
44.6
|
passage
|
300
|
64
|
5
|
7009.0
|
7159.7
|
7465.8
|
8631.8
|
44.6
|
passage
|
512
|
64
|
1
|
2316.9
|
2317.3
|
2321.5
|
2323.1
|
27.6
|
passage
|
512
|
64
|
3
|
6328.9
|
6407.9
|
6414.3
|
6415.5
|
29.9
|
passage
|
512
|
64
|
5
|
10559.8
|
10698.6
|
11012.5
|
11124.2
|
29.7
|
query
|
20
|
1
|
1
|
22.5
|
22.5
|
22.8
|
22.9
|
44.4
|
query
|
20
|
1
|
3
|
49.5
|
53.2
|
53.6
|
53.8
|
60.6
|
query
|
20
|
1
|
5
|
81.2
|
88.5
|
89.1
|
89.2
|
61.6
|
query
|
20
|
1
|
7
|
114.8
|
123.9
|
124.5
|
124.7
|
60.9
|
query
|
20
|
1
|
9
|
147.6
|
145.4
|
160.0
|
160.1
|
60.9
|
query
|
20
|
1
|
11
|
179.3
|
177.9
|
195.4
|
195.6
|
61.3
|
query
|
20
|
1
|
13
|
212.8
|
213.6
|
231.3
|
231.5
|
61.0
|
query
|
20
|
1
|
15
|
243.0
|
248.4
|
266.5
|
266.7
|
61.7
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
1887.4
|
1890.5
|
1954.8
|
1970.8
|
33.9
|
passage
|
300
|
64
|
3
|
5411.6
|
5587.9
|
5787.5
|
6121.0
|
34.9
|
passage
|
300
|
64
|
5
|
8957.8
|
9469.9
|
11139.8
|
11486.4
|
34.6
|
passage
|
512
|
64
|
1
|
2839.6
|
2851.3
|
2973.2
|
3008.1
|
22.5
|
passage
|
512
|
64
|
3
|
8179.3
|
8529.3
|
8662.5
|
8678.7
|
23.0
|
passage
|
512
|
64
|
5
|
13935.1
|
14520.8
|
14928.1
|
15156.9
|
22.5
|
query
|
20
|
1
|
1
|
24.0
|
23.9
|
24.2
|
24.3
|
41.6
|
query
|
20
|
1
|
3
|
51.3
|
54.4
|
55.3
|
55.5
|
58.4
|
query
|
20
|
1
|
5
|
87.2
|
91.3
|
92.8
|
93.1
|
57.3
|
query
|
20
|
1
|
7
|
120.8
|
126.9
|
129.5
|
129.8
|
57.9
|
query
|
20
|
1
|
9
|
154.8
|
162.4
|
166.6
|
166.9
|
58.1
|
query
|
20
|
1
|
11
|
187.7
|
185.9
|
203.5
|
203.8
|
58.5
|
query
|
20
|
1
|
13
|
223.0
|
222.1
|
239.8
|
240.3
|
58.2
|
query
|
20
|
1
|
15
|
256.2
|
258.2
|
276.6
|
277.3
|
58.5
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
16
|
1
|
2927.9
|
2927.4
|
3059.5
|
3089.5
|
5.5
|
passage
|
300
|
16
|
3
|
8379.6
|
8563.5
|
8739.8
|
9256.8
|
5.6
|
passage
|
300
|
16
|
5
|
13629.2
|
14355.2
|
14642.3
|
14710.0
|
5.6
|
passage
|
300
|
64
|
1
|
11385.6
|
11350.1
|
11646.0
|
11693.3
|
5.6
|
passage
|
300
|
64
|
3
|
29783.1
|
33442.8
|
33609.1
|
33687.4
|
5.7
|
passage
|
300
|
64
|
5
|
43320.7
|
55557.3
|
55833.8
|
55911.2
|
5.8
|
query
|
20
|
1
|
1
|
39.8
|
39.7
|
40.2
|
40.4
|
25.1
|
query
|
20
|
1
|
3
|
95.5
|
100.7
|
101.4
|
101.7
|
31.4
|
query
|
20
|
1
|
5
|
157.1
|
167.9
|
168.8
|
169.0
|
31.8
|
query
|
20
|
1
|
7
|
224.1
|
235.1
|
236.2
|
236.5
|
31.2
|
query
|
20
|
1
|
9
|
284.4
|
302.1
|
303.6
|
303.9
|
31.6
|
query
|
20
|
1
|
11
|
345.7
|
339.8
|
370.6
|
370.9
|
31.8
|
query
|
20
|
1
|
13
|
410.6
|
406.0
|
437.9
|
438.2
|
31.6
|
query
|
20
|
1
|
15
|
470.4
|
472.7
|
505.1
|
505.6
|
31.8
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
16
|
1
|
1753.4
|
1745.5
|
1831.3
|
1850.1
|
9.1
|
passage
|
300
|
16
|
3
|
5090.3
|
5173.0
|
5259.7
|
5416.0
|
9.3
|
passage
|
300
|
16
|
5
|
8349.1
|
8605.0
|
8750.8
|
8830.5
|
9.3
|
passage
|
300
|
64
|
1
|
7270.1
|
7290.9
|
7380.2
|
7390.7
|
8.8
|
passage
|
300
|
64
|
3
|
20045.6
|
21451.2
|
21691.0
|
21695.3
|
8.9
|
passage
|
300
|
64
|
5
|
31066.4
|
35673.0
|
36053.9
|
36088.0
|
8.9
|
query
|
20
|
1
|
1
|
66.4
|
66.2
|
67.1
|
67.2
|
15.0
|
query
|
20
|
1
|
3
|
168.6
|
179.1
|
180.8
|
181.5
|
17.8
|
query
|
20
|
1
|
5
|
278.9
|
298.7
|
300.6
|
300.9
|
17.9
|
query
|
20
|
1
|
7
|
388.5
|
417.5
|
419.9
|
420.6
|
18.0
|
query
|
20
|
1
|
9
|
501.5
|
535.8
|
539.7
|
540.5
|
17.9
|
query
|
20
|
1
|
11
|
616.4
|
603.0
|
659.1
|
659.8
|
17.8
|
query
|
20
|
1
|
13
|
728.6
|
722.0
|
778.9
|
779.8
|
17.8
|
query
|
20
|
1
|
15
|
838.3
|
840.9
|
897.7
|
898.6
|
17.8
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
98.0
|
95.0
|
105.8
|
106.6
|
651.7
|
passage
|
300
|
64
|
3
|
144.9
|
142.3
|
159.3
|
174.9
|
1320.4
|
passage
|
300
|
64
|
5
|
243.3
|
238.8
|
283.4
|
298.1
|
1311.5
|
passage
|
512
|
64
|
1
|
112.0
|
112.0
|
112.9
|
113.2
|
569.2
|
passage
|
512
|
64
|
3
|
223.2
|
253.4
|
257.1
|
257.7
|
857.7
|
passage
|
512
|
64
|
5
|
300.7
|
295.7
|
356.5
|
360.0
|
1061.4
|
query
|
20
|
1
|
1
|
4.6
|
4.6
|
4.8
|
4.8
|
215.6
|
query
|
20
|
1
|
3
|
7.0
|
7.2
|
7.5
|
7.8
|
426.4
|
query
|
20
|
1
|
5
|
11.4
|
11.9
|
12.1
|
12.2
|
434.7
|
query
|
20
|
1
|
7
|
16.0
|
16.7
|
16.9
|
17.0
|
434.7
|
query
|
20
|
1
|
9
|
20.6
|
21.4
|
21.8
|
21.9
|
435.3
|
query
|
20
|
1
|
11
|
25.2
|
26.2
|
26.7
|
26.9
|
435.8
|
query
|
20
|
1
|
13
|
30.2
|
31.2
|
31.8
|
32.1
|
429.8
|
query
|
20
|
1
|
15
|
34.9
|
35.8
|
36.4
|
36.6
|
429.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
2586.4
|
2617.0
|
2802.0
|
2803.0
|
24.7
|
passage
|
300
|
64
|
3
|
7438.4
|
7622.4
|
7961.6
|
8037.8
|
25.2
|
passage
|
300
|
64
|
5
|
12158.7
|
12724.5
|
13256.9
|
13319.0
|
25.1
|
passage
|
512
|
64
|
1
|
3727.8
|
3727.5
|
3728.9
|
3729.3
|
17.2
|
passage
|
512
|
64
|
3
|
10810.7
|
11063.3
|
11102.2
|
11154.1
|
17.3
|
passage
|
512
|
64
|
5
|
17458.0
|
14878.2
|
22157.8
|
22183.8
|
17.3
|
query
|
20
|
1
|
1
|
32.3
|
32.2
|
32.6
|
32.7
|
30.8
|
query
|
20
|
1
|
3
|
81.1
|
85.5
|
85.9
|
86.0
|
36.9
|
query
|
20
|
1
|
5
|
136.5
|
142.8
|
143.1
|
143.3
|
36.6
|
query
|
20
|
1
|
7
|
189.4
|
199.9
|
200.4
|
200.5
|
36.9
|
query
|
20
|
1
|
9
|
245.6
|
257.0
|
257.6
|
257.8
|
36.6
|
query
|
20
|
1
|
11
|
297.5
|
313.4
|
314.5
|
314.7
|
36.9
|
query
|
20
|
1
|
13
|
350.9
|
344.2
|
371.6
|
371.8
|
37.0
|
query
|
20
|
1
|
15
|
409.1
|
427.2
|
429.0
|
429.3
|
36.6
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
172.1
|
171.9
|
186.3
|
189.1
|
371.4
|
passage
|
300
|
64
|
3
|
334.3
|
335.5
|
363.5
|
383.4
|
573.4
|
passage
|
300
|
64
|
5
|
556.5
|
557.5
|
585.4
|
600.3
|
573.6
|
passage
|
512
|
64
|
1
|
203.5
|
202.6
|
206.5
|
207.2
|
314.2
|
passage
|
512
|
64
|
3
|
406.1
|
406.7
|
497.6
|
502.4
|
472.0
|
passage
|
512
|
64
|
5
|
673.6
|
673.2
|
718.2
|
760.0
|
474.1
|
query
|
20
|
1
|
1
|
5.3
|
5.2
|
5.6
|
5.7
|
188.6
|
query
|
20
|
1
|
3
|
7.3
|
7.4
|
7.5
|
7.5
|
408.6
|
query
|
20
|
1
|
5
|
11.9
|
12.3
|
12.5
|
12.5
|
417.7
|
query
|
20
|
1
|
7
|
16.5
|
17.2
|
17.4
|
17.5
|
423.6
|
query
|
20
|
1
|
9
|
21.2
|
22.1
|
22.3
|
22.4
|
424.4
|
query
|
20
|
1
|
11
|
25.9
|
27.0
|
27.3
|
27.4
|
423.7
|
query
|
20
|
1
|
13
|
30.8
|
31.9
|
32.4
|
32.5
|
421.9
|
query
|
20
|
1
|
15
|
35.2
|
34.9
|
37.1
|
37.2
|
425.5
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
189.4
|
187.6
|
203.0
|
204.9
|
337.5
|
passage
|
300
|
64
|
3
|
371.2
|
372.3
|
396.3
|
404.4
|
516.2
|
passage
|
300
|
64
|
5
|
621.1
|
622.6
|
655.5
|
677.5
|
513.7
|
passage
|
512
|
64
|
1
|
228.0
|
227.0
|
233.5
|
234.5
|
280.4
|
passage
|
512
|
64
|
3
|
462.9
|
467.3
|
559.3
|
570.4
|
414.1
|
passage
|
512
|
64
|
5
|
840.0
|
807.1
|
1040.2
|
1089.9
|
379.9
|
query
|
20
|
1
|
1
|
6.6
|
6.6
|
7.0
|
7.2
|
150.7
|
query
|
20
|
1
|
3
|
7.4
|
7.4
|
7.5
|
7.6
|
399.7
|
query
|
20
|
1
|
5
|
12.1
|
12.5
|
12.7
|
12.8
|
411.2
|
query
|
20
|
1
|
7
|
16.8
|
17.4
|
17.8
|
17.9
|
413.2
|
query
|
20
|
1
|
9
|
21.7
|
22.4
|
22.9
|
23.0
|
413.7
|
query
|
20
|
1
|
11
|
26.3
|
27.3
|
27.6
|
27.7
|
417.1
|
query
|
20
|
1
|
13
|
31.1
|
32.0
|
32.6
|
32.7
|
416.9
|
query
|
20
|
1
|
15
|
36.4
|
37.3
|
37.7
|
37.8
|
411.0
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
373.3
|
366.4
|
397.5
|
401.4
|
171.3
|
passage
|
300
|
64
|
3
|
918.5
|
919.8
|
958.2
|
963.8
|
208.4
|
passage
|
300
|
64
|
5
|
1527.0
|
1534.3
|
1589.6
|
1594.6
|
208.4
|
passage
|
512
|
64
|
1
|
470.4
|
469.4
|
475.0
|
476.0
|
136.0
|
passage
|
512
|
64
|
3
|
1180.7
|
1184.0
|
1184.5
|
1184.7
|
162.0
|
passage
|
512
|
64
|
5
|
1960.7
|
1973.6
|
1974.0
|
1974.2
|
162.0
|
query
|
20
|
1
|
1
|
10.7
|
10.7
|
11.0
|
11.0
|
93.2
|
query
|
20
|
1
|
3
|
19.6
|
20.4
|
20.9
|
21.0
|
152.6
|
query
|
20
|
1
|
5
|
32.2
|
34.2
|
34.9
|
35.3
|
154.9
|
query
|
20
|
1
|
7
|
45.9
|
48.0
|
48.8
|
49.1
|
152.2
|
query
|
20
|
1
|
9
|
59.5
|
62.0
|
63.0
|
63.2
|
151.1
|
query
|
20
|
1
|
11
|
72.7
|
76.0
|
77.3
|
77.8
|
151.1
|
query
|
20
|
1
|
13
|
85.7
|
89.0
|
90.7
|
90.9
|
151.6
|
query
|
20
|
1
|
15
|
99.9
|
103.5
|
104.4
|
104.7
|
149.9
|
Input Type
|
Input Tokens
|
Batch Size
|
Concurrency
|
Avg Latency
|
P50 Latency
|
P90 Latency
|
P95 Latency
|
Throughput (inputs/s)
|
passage
|
300
|
64
|
1
|
489.9
|
488.4
|
519.3
|
521.3
|
130.6
|
passage
|
300
|
64
|
3
|
1355.4
|
1354.3
|
1413.0
|
1423.2
|
141.0
|
passage
|
300
|
64
|
5
|
2251.5
|
2271.3
|
2338.8
|
2345.4
|
140.9
|
passage
|
512
|
64
|
1
|
641.5
|
640.7
|
647.5
|
648.6
|
99.7
|
passage
|
512
|
64
|
3
|
1797.2
|
1807.8
|
1813.7
|
1814.9
|
106.2
|
passage
|
512
|
64
|
5
|
2979.6
|
3014.9
|
3020.7
|
3021.9
|
106.2
|
query
|
20
|
1
|
1
|
7.9
|
7.9
|
8.2
|
8.4
|
125.6
|
query
|
20
|
1
|
3
|
11.9
|
12.3
|
12.6
|
12.7
|
251.4
|
query
|
20
|
1
|
5
|
20.0
|
20.6
|
20.9
|
20.9
|
249.5
|
query
|
20
|
1
|
7
|
27.7
|
28.9
|
29.4
|
29.5
|
251.5
|
query
|
20
|
1
|
9
|
35.6
|
37.0
|
37.6
|
37.8
|
252.2
|
query
|
20
|
1
|
11
|
43.6
|
45.3
|
45.9
|
46.1
|
251.9
|
query
|
20
|
1
|
13
|
51.5
|
53.3
|
54.2
|
54.4
|
252.2
|
query
|
20
|
1
|
15
|
59.6
|
59.3
|
63.0
|
63.3
|
251.0