Llama-2 Results
Inference Performance
Inference performance was measured for - (1- 8 × A100 80GB SXM4) - (1- 8 × H100 80GB HBM3)
Configuration 1: Chatbot Conversation use case
- batch size: 1 - 8
input tokens length: 128
output tokens length: 20
Model size |
Batch Size |
Average Latency [ms] |
Average Throughput [sentences/s] |
TP |
PP |
GPUs |
||
---|---|---|---|---|---|---|---|---|
A100 80GB SXM4 |
H100 80GB HBM3 |
A100 80GB SXM4 |
H100 80GB HBM3 |
|||||
Llama-2-7B |
1 |
234.0 |
153.8 |
4.3 |
6.5 |
1 |
1 |
1 |
Llama-2-7B |
2 |
249.6 |
160.8 |
8.0 |
12.4 |
1 |
1 |
1 |
Llama-2-7B |
4 |
272.6 |
174.8 |
14.7 |
22.9 |
1 |
1 |
1 |
Llama-2-7B |
8 |
329.5 |
199.2 |
24.3 |
40.2 |
1 |
1 |
1 |
Llama-2-7B |
1 |
171.7 |
128.6 |
5.8 |
7.8 |
2 |
1 |
2 |
Llama-2-7B |
2 |
180.3 |
132.0 |
11.1 |
15.1 |
2 |
1 |
2 |
Llama-2-7B |
4 |
202.9 |
137.9 |
19.7 |
29.0 |
2 |
1 |
2 |
Llama-2-7B |
8 |
237.6 |
156.2 |
33.7 |
51.2 |
2 |
1 |
2 |
Llama-2-7B |
1 |
143.7 |
107.4 |
7.0 |
9.3 |
4 |
1 |
4 |
Llama-2-7B |
2 |
149.9 |
114.0 |
13.3 |
17.5 |
4 |
1 |
4 |
Llama-2-7B |
4 |
165.2 |
120.4 |
24.2 |
33.2 |
4 |
1 |
4 |
Llama-2-7B |
8 |
196.4 |
134.6 |
40.7 |
59.5 |
4 |
1 |
4 |
Llama-2-7B |
1 |
136.5 |
97.6 |
7.3 |
10.2 |
8 |
1 |
8 |
Llama-2-7B |
2 |
143.3 |
109.1 |
14.0 |
18.3 |
8 |
1 |
8 |
Llama-2-7B |
4 |
158.6 |
116.2 |
25.2 |
34.4 |
8 |
1 |
8 |
Llama-2-7B |
8 |
181.9 |
129.1 |
44.0 |
62.0 |
8 |
1 |
8 |
Llama-2-13B |
1 |
142.5 |
86.4 |
7.0 |
11.6 |
1 |
1 |
1 |
Llama-2-13B |
2 |
163.3 |
94.2 |
12.2 |
21.2 |
1 |
1 |
1 |
Llama-2-13B |
4 |
198.5 |
117.8 |
20.2 |
34.0 |
1 |
1 |
1 |
Llama-2-13B |
8 |
282.6 |
146.7 |
28.3 |
54.5 |
1 |
1 |
1 |
Llama-2-13B |
1 |
100.4 |
69.7 |
10.0 |
14.3 |
2 |
1 |
2 |
Llama-2-13B |
2 |
112.2 |
73.7 |
17.8 |
27.1 |
2 |
1 |
2 |
Llama-2-13B |
4 |
320.0 |
88.3 |
12.5 |
45.3 |
2 |
1 |
2 |
Llama-2-13B |
8 |
188.6 |
109.8 |
42.4 |
72.8 |
2 |
1 |
2 |
Llama-2-13B |
1 |
207.8 |
61.4 |
4.8 |
16.3 |
4 |
1 |
4 |
Llama-2-13B |
2 |
84.6 |
62.0 |
23.6 |
32.3 |
4 |
1 |
4 |
Llama-2-13B |
4 |
102.3 |
72.0 |
39.1 |
55.6 |
4 |
1 |
4 |
Llama-2-13B |
8 |
143.0 |
88.6 |
56.0 |
90.3 |
4 |
1 |
4 |
Llama-2-13B |
1 |
72.2 |
54.3 |
13.9 |
18.4 |
8 |
1 |
8 |
Llama-2-13B |
2 |
76.3 |
59.3 |
26.2 |
33.7 |
8 |
1 |
8 |
Llama-2-13B |
4 |
212.0 |
157.3 |
18.9 |
25.4 |
8 |
1 |
8 |
Llama-2-13B |
8 |
242.6 |
81.8 |
33.0 |
97.8 |
8 |
1 |
8 |
Llama-2-70B |
1 |
1,108.3 |
652.7 |
0.9 |
1.5 |
2 |
1 |
2 |
Llama-2-70B |
2 |
1,156.6 |
668.2 |
1.7 |
3.0 |
2 |
1 |
2 |
Llama-2-70B |
4 |
1,272.9 |
742.1 |
3.1 |
5.4 |
2 |
1 |
2 |
Llama-2-70B |
8 |
1,520.3 |
818.2 |
5.3 |
9.8 |
2 |
1 |
2 |
Llama-2-70B |
1 |
673.4 |
433.3 |
1.5 |
2.3 |
4 |
1 |
4 |
Llama-2-70B |
2 |
715.3 |
446.9 |
2.8 |
4.5 |
4 |
1 |
4 |
Llama-2-70B |
4 |
784.8 |
487.3 |
5.1 |
8.2 |
4 |
1 |
4 |
Llama-2-70B |
8 |
941.4 |
537.3 |
8.5 |
14.9 |
4 |
1 |
4 |
Llama-2-70B |
1 |
504.1 |
343.1 |
2.0 |
2.9 |
8 |
1 |
8 |
Llama-2-70B |
2 |
542.0 |
359.8 |
3.7 |
5.6 |
8 |
1 |
8 |
Llama-2-70B |
4 |
586.6 |
386.6 |
6.8 |
10.3 |
8 |
1 |
8 |
Llama-2-70B |
8 |
695.6 |
428.2 |
11.5 |
18.7 |
8 |
1 |
8 |
Configuration 2: Translation / Style Transfer use case
- batch size: 1 - 8
input tokens length: 200
output tokens length: 200
Model size |
Batch Size |
Average Latency [ms] |
Average Throughput [sentences/s] |
TP |
PP |
GPUs |
||
---|---|---|---|---|---|---|---|---|
A100 80GB SXM4 |
H100 80GB HBM3 |
A100 80GB SXM4 |
H100 80GB HBM3 |
|||||
Llama-2-7B |
1 |
2,189.6 |
1,440.0 |
0.5 |
0.7 |
1 |
1 |
1 |
Llama-2-7B |
2 |
2,227.9 |
1,463.8 |
0.9 |
1.4 |
1 |
1 |
1 |
Llama-2-7B |
4 |
2,386.5 |
1,509.7 |
1.7 |
2.7 |
1 |
1 |
1 |
Llama-2-7B |
8 |
2,611.4 |
1,653.7 |
3.1 |
4.8 |
1 |
1 |
1 |
Llama-2-7B |
1 |
1,544.2 |
1,143.2 |
0.6 |
0.9 |
2 |
1 |
2 |
Llama-2-7B |
2 |
1,588.9 |
1,163.0 |
1.3 |
1.7 |
2 |
1 |
2 |
Llama-2-7B |
4 |
1,649.4 |
1,175.1 |
2.4 |
3.4 |
2 |
1 |
2 |
Llama-2-7B |
8 |
1,841.0 |
1,238.2 |
4.3 |
6.5 |
2 |
1 |
2 |
Llama-2-7B |
1 |
1,280.0 |
923.8 |
0.8 |
1.1 |
4 |
1 |
4 |
Llama-2-7B |
2 |
1,313.0 |
991.0 |
1.5 |
2.0 |
4 |
1 |
4 |
Llama-2-7B |
4 |
1,383.5 |
1,017.2 |
2.9 |
3.9 |
4 |
1 |
4 |
Llama-2-7B |
8 |
1,463.5 |
1,070.9 |
5.5 |
7.5 |
4 |
1 |
4 |
Llama-2-7B |
1 |
1,187.4 |
827.6 |
0.8 |
1.2 |
8 |
1 |
8 |
Llama-2-7B |
2 |
1,248.4 |
936.5 |
1.6 |
2.1 |
8 |
1 |
8 |
Llama-2-7B |
4 |
1,329.7 |
975.4 |
3.0 |
4.1 |
8 |
1 |
8 |
Llama-2-7B |
8 |
1,416.6 |
1,020.7 |
5.6 |
7.8 |
8 |
1 |
8 |
Llama-2-13B |
1 |
3,884.5 |
2,396.8 |
0.3 |
0.4 |
1 |
1 |
1 |
Llama-2-13B |
2 |
4,020.7 |
2,413.6 |
0.5 |
0.8 |
1 |
1 |
1 |
Llama-2-13B |
4 |
4,250.9 |
2,559.8 |
0.9 |
1.6 |
1 |
1 |
1 |
Llama-2-13B |
8 |
4,590.2 |
2,722.8 |
1.7 |
2.9 |
1 |
1 |
1 |
Llama-2-13B |
1 |
2,499.1 |
1,717.2 |
0.4 |
0.6 |
2 |
1 |
2 |
Llama-2-13B |
2 |
2,620.4 |
1,746.2 |
0.8 |
1.1 |
2 |
1 |
2 |
Llama-2-13B |
4 |
2,699.3 |
1,778.3 |
1.5 |
2.2 |
2 |
1 |
2 |
Llama-2-13B |
8 |
2,967.1 |
1,944.8 |
2.7 |
4.1 |
2 |
1 |
2 |
Llama-2-13B |
1 |
1,894.0 |
1,431.2 |
0.5 |
0.7 |
4 |
1 |
4 |
Llama-2-13B |
2 |
1,945.1 |
1,407.0 |
1.0 |
1.4 |
4 |
1 |
4 |
Llama-2-13B |
4 |
2,047.3 |
1,451.8 |
2.0 |
2.8 |
4 |
1 |
4 |
Llama-2-13B |
8 |
2,117.4 |
1,498.0 |
3.8 |
5.3 |
4 |
1 |
4 |
Llama-2-13B |
1 |
1,692.8 |
1,201.3 |
0.6 |
0.8 |
8 |
1 |
8 |
Llama-2-13B |
2 |
1,735.4 |
1,304.1 |
1.2 |
1.5 |
8 |
1 |
8 |
Llama-2-13B |
4 |
1,836.8 |
1,361.6 |
2.2 |
2.9 |
8 |
1 |
8 |
Llama-2-13B |
8 |
1,926.9 |
1,420.0 |
4.2 |
5.6 |
8 |
1 |
8 |
Llama-2-70B |
1 |
10,500.4 |
6,267.3 |
0.1 |
0.2 |
2 |
1 |
2 |
Llama-2-70B |
2 |
10,695.1 |
6,288.4 |
0.2 |
0.3 |
2 |
1 |
2 |
Llama-2-70B |
4 |
11,151.1 |
6,401.6 |
0.4 |
0.6 |
2 |
1 |
2 |
Llama-2-70B |
8 |
11,858.6 |
6,731.0 |
0.7 |
1.2 |
2 |
1 |
2 |
Llama-2-70B |
1 |
6,403.0 |
4,115.6 |
0.2 |
0.2 |
4 |
1 |
4 |
Llama-2-70B |
2 |
6,604.8 |
4,146.6 |
0.3 |
0.5 |
4 |
1 |
4 |
Llama-2-70B |
4 |
6,833.8 |
4,241.9 |
0.6 |
0.9 |
4 |
1 |
4 |
Llama-2-70B |
8 |
7,394.9 |
4,367.1 |
1.1 |
1.8 |
4 |
1 |
4 |
Llama-2-70B |
1 |
4,734.8 |
3,202.1 |
0.2 |
0.3 |
8 |
1 |
8 |
Llama-2-70B |
2 |
4,995.7 |
3,311.5 |
0.4 |
0.6 |
8 |
1 |
8 |
Llama-2-70B |
4 |
5,110.5 |
3,379.7 |
0.8 |
1.2 |
8 |
1 |
8 |
Llama-2-70B |
8 |
5,577.7 |
3,450.4 |
1.4 |
2.3 |
8 |
1 |
8 |