Performance#
This section shows the latency and throughput numbers for Llama models powered by NVIDIA NIM. Please see Using GenAI-Perf to Benchmark for the benchmark process.
For specifications of the hardware on which these measurements were collected, see the Hardware Specifications section.
Llama-3.3-70b-instruct Results#
NIM Container: Llama-3.3-70b-Instruct
Version: 1.8.0#
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
406.77 |
18.58 |
51.67 |
5 |
546.7 |
21.83 |
218.48 |
25 |
688.69 |
38.75 |
623.4 |
50 |
834.51 |
59.73 |
814.37 |
100 |
7996.25 |
92.91 |
917.26 |
150 |
34794.22 |
92.86 |
920.9 |
200 |
61680.22 |
93.06 |
920.54 |
250 |
88605.32 |
93.2 |
920.12 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
47.77 |
18.91 |
52.85 |
5 |
126.53 |
19.04 |
261.83 |
25 |
147.63 |
21.69 |
1149.05 |
50 |
181.64 |
23.97 |
2078.43 |
100 |
229.77 |
29.98 |
3322.87 |
150 |
296.74 |
37.85 |
3948.26 |
200 |
4867.89 |
40.72 |
4624.14 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
31.22 |
18.8 |
53 |
5 |
88.7 |
18.83 |
260.67 |
25 |
138.98 |
20.61 |
1178.68 |
50 |
156.28 |
23.33 |
2081.24 |
100 |
176.8 |
29.87 |
3263.31 |
150 |
195.51 |
37.75 |
3883.1 |
200 |
253.36 |
42.95 |
4534.54 |
250 |
264.22 |
48.16 |
5067.05 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
82.46 |
18.67 |
53.37 |
5 |
145.62 |
19.07 |
260.48 |
25 |
250.49 |
22.45 |
1102.23 |
50 |
278.62 |
25.87 |
1912.93 |
100 |
334.23 |
34.21 |
2895.49 |
150 |
428.54 |
44.44 |
3342.67 |
200 |
558.23 |
50.66 |
3901.74 |
250 |
3499.83 |
56.3 |
4167.39 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
1833.01 |
19.58 |
48.8 |
5 |
2910.33 |
25.24 |
186.8 |
25 |
18781.26 |
48.26 |
415.33 |
50 |
123969.71 |
49.22 |
418.65 |
100 |
333564.85 |
49.81 |
425.1 |
150 |
543226.4 |
50.01 |
423.95 |
200 |
753417.03 |
50.06 |
425.74 |
250 |
963222.96 |
50.13 |
427.06 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
418.61 |
16.72 |
57.05 |
5 |
567.86 |
19.15 |
246.86 |
25 |
708 |
35.86 |
671.12 |
50 |
841.35 |
56.65 |
857.45 |
100 |
5701.4 |
91.4 |
972.2 |
150 |
31108.04 |
91.64 |
972.92 |
200 |
56556.61 |
91.81 |
972.7 |
250 |
82015.69 |
91.91 |
972.59 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
46.38 |
16.79 |
59.51 |
5 |
121.67 |
16.36 |
304.63 |
25 |
169.85 |
18.64 |
1335.79 |
50 |
190.84 |
19.79 |
2515.16 |
100 |
238.18 |
26.06 |
3820.26 |
150 |
302.88 |
33.98 |
4395.18 |
200 |
357.1 |
37.06 |
5370.23 |
250 |
576.85 |
39.99 |
6202.02 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
29.58 |
16.85 |
59.12 |
5 |
96.02 |
16.41 |
297.51 |
25 |
144.09 |
18.03 |
1339.22 |
50 |
157.84 |
20.17 |
2391.63 |
100 |
187.07 |
27.57 |
3520.02 |
150 |
201.75 |
36.76 |
3983.17 |
200 |
258.88 |
42.13 |
4617.01 |
250 |
294.21 |
47 |
5172.59 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
81.45 |
16.72 |
59.56 |
5 |
147.1 |
16.48 |
301.06 |
25 |
252.61 |
19.38 |
1274.05 |
50 |
278.5 |
21.84 |
2261.37 |
100 |
332.18 |
30.22 |
3273.32 |
150 |
423.88 |
41.08 |
3614.04 |
200 |
582.43 |
46.43 |
4251.5 |
250 |
666.7 |
52.75 |
4676.16 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
1892.93 |
17.44 |
54.38 |
5 |
2981.43 |
21.61 |
215.84 |
25 |
7173.02 |
44.36 |
516.05 |
50 |
12866.29 |
72.84 |
621.2 |
100 |
145188.49 |
81.99 |
615.1 |
150 |
288488.94 |
82.99 |
621.73 |
200 |
432638.5 |
83.58 |
620.62 |
250 |
576444.85 |
84.03 |
620.09 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
81.66 |
16.12 |
61.78 |
5 |
306.56 |
15.63 |
314.03 |
25 |
894.46 |
18.35 |
1299.94 |
50 |
1067.36 |
20.75 |
2293.14 |
100 |
1158.71 |
29.61 |
3252.32 |
150 |
1341.05 |
41.04 |
3541.71 |
200 |
1283.61 |
46.78 |
4164.44 |
250 |
1808.22 |
51.77 |
4669.86 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
214.63 |
13.89 |
69.96 |
5 |
640.52 |
14.34 |
320.72 |
25 |
592.2 |
24.78 |
964.48 |
50 |
661.04 |
36.53 |
1322.8 |
100 |
857.04 |
61.36 |
1587.86 |
150 |
1098.38 |
88 |
1665.56 |
200 |
1149.56 |
111.46 |
1760.61 |
250 |
6176.07 |
127.61 |
1785.89 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
32.41 |
13.85 |
72.17 |
5 |
93.74 |
12.93 |
385.51 |
25 |
356.42 |
14.19 |
1740.17 |
50 |
492.05 |
15.62 |
3153.24 |
100 |
566.05 |
19.44 |
5071.9 |
150 |
707 |
24.7 |
5988.23 |
200 |
799.72 |
26.52 |
7430.1 |
250 |
1152.75 |
28.57 |
8579.87 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
22.73 |
13.8 |
72.22 |
5 |
56.59 |
13.44 |
366.12 |
25 |
160.64 |
13.83 |
1716.65 |
50 |
266.81 |
14.83 |
3104.67 |
100 |
370.92 |
18.82 |
4849.98 |
150 |
423.34 |
24.16 |
5718.54 |
200 |
639.02 |
25.32 |
7031.28 |
250 |
890.53 |
27.6 |
7803.07 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
51.01 |
13.99 |
71.26 |
5 |
170.81 |
13.06 |
378.3 |
25 |
503.84 |
14.76 |
1639.43 |
50 |
608.55 |
16.86 |
2864.92 |
100 |
613.39 |
22.02 |
4421.55 |
150 |
743.73 |
28.79 |
5082.9 |
200 |
955.06 |
31.74 |
6120.83 |
250 |
1405.48 |
34.75 |
6918.63 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
941.17 |
14.1 |
68.68 |
5 |
2191.86 |
16.22 |
288.76 |
25 |
4157.12 |
31.48 |
744.05 |
50 |
6205.6 |
50.33 |
933.97 |
100 |
84784.81 |
57.29 |
958.23 |
150 |
176553.42 |
58.23 |
966.23 |
200 |
269826.81 |
58.42 |
957.89 |
250 |
362546.32 |
58.52 |
963.45 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
78.59 |
15.49 |
64.3 |
5 |
292.54 |
16.74 |
293.91 |
25 |
893.19 |
20.19 |
1186.59 |
50 |
996.37 |
24.49 |
1963.32 |
100 |
1091.68 |
33.11 |
2926.44 |
150 |
1204.74 |
41.27 |
3534.23 |
200 |
1249.37 |
48.75 |
4002.72 |
250 |
5689.49 |
54.54 |
4133.74 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
57.63 |
10.25 |
97.06 |
5 |
175.85 |
11.3 |
436.08 |
25 |
536.06 |
14.51 |
1663.36 |
50 |
602.94 |
18.33 |
2642.87 |
100 |
655.12 |
25.67 |
3801.4 |
150 |
687.9 |
31.68 |
4637.51 |
200 |
688.09 |
36.23 |
5421.19 |
250 |
795.95 |
40.31 |
6085.75 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
554.81 |
18.54 |
50.99 |
5 |
1005.5 |
23.9 |
193.29 |
25 |
1362.44 |
52 |
457.61 |
50 |
1723.18 |
82.19 |
584.77 |
100 |
2257.5 |
146.67 |
662.51 |
150 |
2644.06 |
213.3 |
687.31 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
79.98 |
18.57 |
53.75 |
5 |
259.87 |
19.88 |
249.99 |
25 |
947.31 |
24.89 |
985.94 |
50 |
1167.07 |
28.75 |
1705.2 |
100 |
1458.45 |
39.83 |
2465.98 |
150 |
1513.77 |
53.21 |
2780.56 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
47.73 |
18.11 |
54.77 |
5 |
135.15 |
19.63 |
247.48 |
25 |
528.55 |
23.47 |
961.89 |
50 |
848.5 |
26.59 |
1629.45 |
100 |
1208.62 |
38.47 |
2257.04 |
150 |
1381.12 |
53.42 |
2498.15 |
200 |
1415.91 |
61.44 |
2932.47 |
250 |
1491.94 |
69.03 |
3283.87 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
131.21 |
18.44 |
53.91 |
5 |
460.71 |
19.83 |
246.66 |
25 |
1345.99 |
26.36 |
903 |
50 |
1498.8 |
31.6 |
1512.02 |
100 |
1625.47 |
46.47 |
2080.95 |
150 |
1729.4 |
63.28 |
2309.02 |
200 |
1680.2 |
72.79 |
2687.61 |
250 |
1884.04 |
82.1 |
2979.05 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
2521.02 |
19.46 |
48.27 |
5 |
5331.09 |
26.89 |
169.17 |
25 |
10457.52 |
64.46 |
358.42 |
50 |
16794.16 |
107.78 |
429.76 |
100 |
178225.46 |
127.17 |
441.63 |
150 |
379534.42 |
128.79 |
444.7 |
200 |
578423.34 |
129.63 |
446.76 |
250 |
778536.81 |
130.36 |
447.46 |
Version: 1.5.0#
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
368.11 |
19.94 |
48.44 |
5 |
446.68 |
23.89 |
199.17 |
25 |
751.74 |
42.12 |
564.5 |
50 |
1083.13 |
63.72 |
738.78 |
100 |
17754.93 |
87.55 |
777.31 |
150 |
46381.93 |
87.39 |
777.94 |
200 |
73326.6 |
87.64 |
777.02 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
71.82 |
19.26 |
51.84 |
5 |
109.62 |
20.04 |
237.72 |
25 |
159.62 |
23.28 |
1032.56 |
50 |
232.46 |
25.58 |
1880.75 |
100 |
293.19 |
32.83 |
2914.64 |
150 |
448.16 |
45.57 |
3096.87 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
32.78 |
19.11 |
52.15 |
5 |
81.17 |
19.9 |
247.26 |
25 |
247.81 |
21.99 |
1080.83 |
50 |
377.42 |
23.94 |
1936.88 |
100 |
583.54 |
31.79 |
2886.57 |
150 |
831.88 |
44.81 |
3067.43 |
200 |
925.3 |
45.92 |
3947.82 |
250 |
1105.17 |
49.83 |
4521.68 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
103.2 |
19.31 |
51.54 |
5 |
125.56 |
20.52 |
234.76 |
25 |
195.68 |
24.76 |
979.58 |
50 |
243.19 |
28.55 |
1706.26 |
100 |
344.86 |
38.65 |
2507.94 |
150 |
568.88 |
54.06 |
2663.03 |
200 |
1384.9 |
58.11 |
3238.57 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
190.36 |
19.6 |
49.57 |
5 |
430.84 |
21.64 |
216.96 |
25 |
870.9 |
35 |
654.15 |
50 |
1066.1 |
51.56 |
903.17 |
100 |
1332.94 |
88.91 |
1062.3 |
150 |
3853.57 |
124.93 |
1070.78 |
200 |
17105.6 |
125.13 |
1064.79 |
250 |
30309.49 |
125.27 |
1073.5 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
1637.36 |
20.5 |
46.92 |
5 |
3126.01 |
27 |
175.65 |
25 |
33539.02 |
52.4 |
323.8 |
50 |
127678.39 |
51.08 |
335.75 |
100 |
277353.39 |
48.51 |
345.84 |
150 |
339462.09 |
48.51 |
342.77 |
200 |
336890.58 |
51.03 |
338.3 |
250 |
515613.11 |
49.4 |
349.04 |
Llama-3.1-8b-instruct Results#
NIM Container: Llama-3.1-8b-Instruct
Version: 1.8.0#
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
10.26 |
4.63 |
214.56 |
5 |
16.81 |
4.58 |
1076.88 |
25 |
36.34 |
5.19 |
4678.88 |
50 |
67.41 |
5.91 |
8018.23 |
100 |
119.97 |
7.57 |
12214.97 |
150 |
346.53 |
9.81 |
12988.89 |
200 |
465.18 |
12.65 |
13348.4 |
250 |
605.6 |
16.14 |
12964.72 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
13.84 |
4.56 |
218.86 |
5 |
30.88 |
4.81 |
1035.85 |
25 |
97.07 |
5.82 |
4263.49 |
50 |
135.26 |
7.06 |
7009.01 |
100 |
239.92 |
9.51 |
10379.22 |
150 |
376.95 |
13.34 |
11053.75 |
200 |
1664.34 |
14.96 |
12549.74 |
250 |
6550.46 |
16.97 |
12268.82 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
19.03 |
4.53 |
220.1 |
5 |
55.87 |
4.8 |
1031.3 |
25 |
268.42 |
5.84 |
4098.01 |
50 |
414.87 |
7.26 |
6514.66 |
100 |
615.16 |
10.04 |
9392.43 |
150 |
885.53 |
14.14 |
9982.74 |
200 |
1258.79 |
16.02 |
11553.34 |
250 |
1885.04 |
19.68 |
11527.61 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
77.45 |
4.62 |
209.96 |
5 |
279.92 |
5.42 |
838.03 |
25 |
724.14 |
9.67 |
2251.44 |
50 |
817.73 |
16.47 |
2765.71 |
100 |
915.15 |
29.67 |
3180.16 |
150 |
1080 |
43.35 |
3301.68 |
200 |
4361.35 |
50.41 |
3378.48 |
250 |
11717.42 |
50.53 |
3372 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
403.2 |
5.06 |
189.96 |
5 |
1141.22 |
7.39 |
627.76 |
25 |
2371.46 |
19.35 |
1217.04 |
50 |
11071.41 |
30.73 |
1330.57 |
100 |
78187.07 |
31.08 |
1336.09 |
150 |
145290.54 |
31.31 |
1337 |
200 |
211735.71 |
31.46 |
1338.24 |
250 |
278580.85 |
31.62 |
1337.66 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
27.36 |
6.52 |
152.84 |
5 |
95.23 |
7.55 |
654.5 |
25 |
332.13 |
9.34 |
2585.01 |
50 |
589.89 |
11.46 |
4152.55 |
100 |
922.41 |
15.99 |
5914.93 |
150 |
1151.54 |
20.92 |
6800.02 |
200 |
1273.42 |
25.68 |
7424.87 |
250 |
5362.51 |
28.09 |
7435.16 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
9.97 |
4.35 |
228.55 |
5 |
19.02 |
4.11 |
1194.26 |
25 |
48.56 |
4.61 |
5170.73 |
50 |
69.42 |
5.32 |
8823.15 |
100 |
124.16 |
7.08 |
12949.5 |
150 |
318.5 |
9.67 |
13298.39 |
200 |
432.64 |
12.69 |
13456.29 |
250 |
553.89 |
15.82 |
13392.14 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
12.88 |
4.25 |
235.17 |
5 |
30.26 |
4.3 |
1160.07 |
25 |
106.08 |
5.12 |
4832.39 |
50 |
146.47 |
6.04 |
8175.67 |
100 |
305.31 |
8.3 |
11824.76 |
150 |
474.13 |
11.71 |
12530.29 |
200 |
3858.17 |
12.7 |
13534.56 |
250 |
8119.64 |
15.01 |
12973.67 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
19.79 |
4.24 |
234.95 |
5 |
57.78 |
4.31 |
1144.52 |
25 |
194.5 |
5.25 |
4598.71 |
50 |
254.85 |
6.4 |
7513.85 |
100 |
389.08 |
9.11 |
10533.71 |
150 |
713.87 |
13.01 |
10930.77 |
200 |
1362.27 |
14.27 |
12625.72 |
250 |
2313.26 |
16.82 |
12826.05 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
80.16 |
4.36 |
221.63 |
5 |
249.53 |
4.92 |
924.18 |
25 |
443.73 |
9.3 |
2458.55 |
50 |
431.31 |
15.69 |
3024.47 |
100 |
455.52 |
28.05 |
3458.25 |
150 |
643.97 |
40.83 |
3566.03 |
200 |
730.31 |
51.75 |
3763.37 |
250 |
804.18 |
63.22 |
3860.48 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
408.66 |
4.65 |
205.97 |
5 |
1118.18 |
6.38 |
720.3 |
25 |
1890.26 |
16.21 |
1455.56 |
50 |
2911.51 |
30.64 |
1555.38 |
100 |
19767.13 |
46.32 |
1736.48 |
150 |
71795.92 |
46.88 |
1735.44 |
200 |
123202.74 |
47.16 |
1734.91 |
250 |
175197.57 |
47.31 |
1735.35 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
28.82 |
7.11 |
140.14 |
5 |
97.14 |
7.27 |
679.46 |
25 |
420.1 |
8.81 |
2711.49 |
50 |
738.5 |
9.57 |
4854.67 |
100 |
1125.65 |
13.45 |
6864.22 |
150 |
1075.56 |
22.17 |
6456.17 |
200 |
1399.1 |
24.22 |
7811.53 |
250 |
2004.62 |
26.37 |
8813.31 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
19.42 |
13.75 |
72.58 |
5 |
49.22 |
14.12 |
349.76 |
25 |
160.89 |
15.56 |
1535.29 |
50 |
348.94 |
16.93 |
2688.88 |
100 |
486 |
21.19 |
4250.73 |
150 |
627.64 |
25.65 |
5232.81 |
200 |
678.66 |
31.16 |
5812.48 |
250 |
706.39 |
36.66 |
6245.55 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
26.98 |
14.06 |
71.1 |
5 |
94.1 |
14.87 |
335.23 |
25 |
381.83 |
18.2 |
1360.15 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
44.61 |
13.97 |
71.42 |
5 |
173.24 |
14.91 |
331.7 |
25 |
617.75 |
18.59 |
1302.3 |
50 |
704.53 |
23.59 |
2060.1 |
100 |
777.75 |
34.1 |
2869.49 |
150 |
770.03 |
44.38 |
3324.96 |
200 |
894.28 |
55.33 |
3559.97 |
250 |
5047.2 |
62.7 |
3676.16 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
243.92 |
14.72 |
65.86 |
5 |
455.05 |
18.14 |
262.89 |
25 |
644.27 |
36.93 |
655.22 |
50 |
830.47 |
60.47 |
806 |
100 |
8988.57 |
94.33 |
889.32 |
150 |
36691.18 |
94.4 |
890.18 |
200 |
64263.2 |
94.29 |
893.23 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
1299.33 |
17 |
56.67 |
5 |
2691.78 |
26.37 |
180.37 |
25 |
27548.98 |
70.21 |
289.26 |
50 |
182179.84 |
70.65 |
289.79 |
100 |
492175.92 |
70.71 |
291.05 |
150 |
798991.51 |
70.76 |
291.4 |
200 |
1106990.55 |
70.85 |
291.4 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
80.75 |
22.05 |
45.23 |
5 |
152.54 |
23.58 |
210.91 |
25 |
269.84 |
31.04 |
799 |
50 |
312.86 |
40.48 |
1226.46 |
100 |
2752.07 |
58.84 |
1613.38 |
150 |
33137.67 |
58.88 |
1616.93 |
200 |
63646.49 |
58.9 |
1618.87 |
Version: 1.3.0#
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
8.98 |
4.37 |
227.44 |
5 |
15.31 |
5.1 |
971.34 |
25 |
21.08 |
6.02 |
4100.33 |
50 |
50.68 |
7.19 |
6745.78 |
100 |
209.66 |
8.62 |
10369.56 |
150 |
398.54 |
12.49 |
10383.16 |
200 |
501.44 |
17.76 |
9884.77 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
13.89 |
4.77 |
209.52 |
5 |
27.22 |
5.45 |
915.96 |
25 |
110.61 |
6.32 |
3924.39 |
50 |
112.02 |
8.3 |
5979.38 |
100 |
245.7 |
11.12 |
8878.96 |
150 |
5217.57 |
13.78 |
8955.7 |
200 |
9112.64 |
18.27 |
8443.89 |
250 |
21097.73 |
19.26 |
8256.58 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
19.97 |
4.77 |
208.88 |
5 |
57.87 |
5.45 |
908.9 |
25 |
170.91 |
6.57 |
3710.08 |
50 |
214.79 |
8.74 |
5582.41 |
100 |
373.73 |
12.41 |
7819.41 |
150 |
890.01 |
15.84 |
8935.06 |
200 |
3231.57 |
18.31 |
9151.55 |
250 |
6469.67 |
21.75 |
8725.87 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
82.87 |
6.04 |
161.51 |
5 |
249.06 |
7.04 |
664.39 |
25 |
372.59 |
11.61 |
2024.67 |
50 |
415.72 |
19.99 |
2398.11 |
100 |
501.51 |
33.71 |
2844.33 |
150 |
646.08 |
47.58 |
3047.56 |
200 |
6830.5 |
51.83 |
3008.29 |
250 |
14795.72 |
51.74 |
3030.76 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
443.88 |
11.3 |
86.79 |
5 |
1011.39 |
12.71 |
377.88 |
25 |
1432.91 |
21.41 |
1099.79 |
50 |
16505.76 |
32.16 |
1116.23 |
100 |
89292.14 |
32.21 |
1190.64 |
150 |
153371.08 |
32.17 |
1195.86 |
200 |
208680.03 |
32.14 |
1200.69 |
250 |
281213.01 |
32.19 |
1213.79 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
11.96 |
6.54 |
152.24 |
5 |
27.18 |
7.39 |
667.60 |
25 |
34.67 |
8.60 |
2858.90 |
50 |
56.65 |
10.04 |
4858.86 |
100 |
168.60 |
12.57 |
7478.74 |
150 |
479.48 |
14.66 |
8811.28 |
200 |
755.88 |
18.81 |
8864.97 |
250 |
957.59 |
24.77 |
8439.91 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
19.89 |
6.94 |
143.93 |
5 |
51.19 |
7.70 |
647.12 |
25 |
202.87 |
9.42 |
2625.41 |
50 |
329.35 |
11.62 |
4242.00 |
100 |
552.68 |
15.67 |
6268.04 |
150 |
782.65 |
20.32 |
7232.18 |
200 |
7996.68 |
21.76 |
7666.88 |
250 |
20815.08 |
21.96 |
7610.47 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
29.73 |
6.93 |
143.87 |
5 |
96.74 |
7.72 |
640.26 |
25 |
386.45 |
9.50 |
2530.05 |
50 |
625.88 |
12.03 |
3951.34 |
100 |
724.55 |
17.79 |
5326.15 |
150 |
1126.07 |
22.88 |
6241.50 |
200 |
1691.82 |
27.89 |
6548.23 |
250 |
5664.63 |
29.12 |
7150.32 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
128.32 |
8.14 |
119.30 |
5 |
503.39 |
8.88 |
506.79 |
25 |
1243.85 |
17.36 |
1259.32 |
50 |
1273.57 |
29.38 |
1561.19 |
100 |
6941.19 |
44.67 |
1689.99 |
150 |
21179.86 |
44.66 |
1690.45 |
200 |
35001.67 |
44.54 |
1694.93 |
250 |
49204.54 |
44.48 |
1703.87 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
637.84 |
13.16 |
74.21 |
5 |
1881.45 |
13.50 |
346.33 |
25 |
19476.78 |
31.76 |
579.50 |
50 |
92969.97 |
31.81 |
558.51 |
100 |
216580.75 |
31.83 |
582.54 |
150 |
301171.32 |
31.79 |
583.14 |
200 |
348383.77 |
31.81 |
582.88 |
250 |
484376.95 |
31.85 |
584.58 |
Llama-3.1-70b-instruct Results#
NIM Container: llama-3.1-70b-instruct
Version: 1.3.0#
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
25.05 |
13.21 |
75.36 |
5 |
64.82 |
12.64 |
387.55 |
25 |
146.11 |
15.31 |
1564.9 |
50 |
169.61 |
17.42 |
2743.6 |
100 |
235.17 |
23.28 |
4092.34 |
150 |
511.21 |
27.73 |
4962.78 |
200 |
582.07 |
31.22 |
5855.9 |
250 |
718.98 |
35.25 |
6429.95 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
38.64 |
14.24 |
70.14 |
5 |
109.06 |
13.68 |
364.24 |
25 |
393.58 |
15.86 |
1557.21 |
50 |
564.17 |
17.14 |
2868.22 |
100 |
640.42 |
22.64 |
4350.2 |
150 |
792.84 |
28.49 |
5183.08 |
200 |
1417.35 |
31.62 |
6168.81 |
250 |
2045.69 |
35.75 |
6511.88 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
59.89 |
14.24 |
69.98 |
5 |
180.35 |
13.72 |
360.12 |
25 |
571.16 |
16.38 |
1475.28 |
50 |
843.45 |
18.15 |
2631.13 |
100 |
740 |
25.55 |
3794.76 |
150 |
952.07 |
32.31 |
4493.57 |
200 |
980.89 |
37.92 |
5114.09 |
250 |
2632.73 |
41.92 |
5497.21 |
Concurrency |
TTFT (ms) |
ITL (ms) |
Throughput(Tokens/s) |
---|---|---|---|
1 |
1018.5 |
30.32 |
32.45 |
5 |
2116.84 |
31.29 |
153.71 |
25 |
3987.46 |
43.16 |
541.65 |
50 |
7012.87 |
56.95 |
793.95 |
100 |
93862.88 |
61.72 |
815.75 |
150 |
175864.67 |
61.73 |
815.38 |
200 |
240432.76 |
61.73 |
815.52 |
250 |
335844.98 |
61.75 |
828.23 |
Hardware Specifications#
NVIDIA H100#
Motherboard Model |
NVIDIA DGX H100 |
Server Model |
NVIDIA DGX H100 |
Number of Nodes |
1 |
CPU Information |
Platinum 8480CL @ 3.8GHz Turbo (Sapphire Rapids) HT On |
Number of CPU sockets enabled |
2 |
Number of CPU threads enabled |
224 |
GPU Information |
H100 80GB HBM3(GH100) |
Driver Information |
570.124.06 (r570_00) |
GPU Core Clock (MHz) |
1980 |
GPU Boost Clock (MHz) |
1980 |
GPU Memory Clock (MHz) |
2619 |
NVIDIA H200#
Motherboard Model |
NVIDIA DGX H200 |
Server Model |
NVIDIA DGX H200 |
Number of Nodes |
1 |
CPU Information |
Platinum 8480CL @ 3.8GHz Turbo (Sapphire Rapids) HT On |
Number of CPU sockets enabled |
2 |
Number of CPU threads enabled |
224 |
GPU Information |
H200 141GB HBM3e(GH100) |
Driver Information |
570.124.06 (r570_00) |
GPU Core Clock (MHz) |
1980 |
GPU Boost Clock (MHz) |
1980 |
GPU Memory Clock (MHz) |
2619 |
NVIDIA A100#
Motherboard Model |
NVIDIA DGX A100 |
Server Model |
NVIDIA DGX A100 |
Number of Nodes |
1 |
CPU Information |
AMD EPYC 7742 @ 2.2GHz 3.4GHz Turbo (Rome) HT On |
Number of CPU sockets enabled |
2 |
Number of CPU threads enabled |
256 |
GPU Information |
A100 SXM4 80GB(GA100) |
Driver Information |
570.124.06 (r570_00) |
GPU Core Clock (MHz) |
1155 |
GPU Boost Clock (MHz) |
1401 |
GPU Memory Clock (MHz) |
1593 |
NVIDIA L40s#
Motherboard Model |
Supermicro X13DEG-OA |
Server Model |
Supermicro SYS-521GE-TNRT |
Number of Nodes |
1 |
CPU Information |
Platinum 8570 @ 2.1GHz 4.0GHz Turbo (Emerald Rapids) HT On |
Number of CPU sockets enabled |
2 |
Number of CPU threads enabled |
224 |
GPU Information |
L40S(AD102) |
Driver Information |
570.124.06 (r570_00) |
GPU Core Clock (MHz) |
2520 |
GPU Boost Clock (MHz) |
2520 |
GPU Memory Clock (MHz) |
9001 |