Inference performance was measured for - (1- 8 × A100 80GB SXM4) - (1- 8 × H100 80GB HBM3)
Configuration 1: Chatbot Conversation use case
- batch size: 1 - 8
input tokens length: 128
output tokens length: 20
Average Latency, Average Throughput, and Model Size
Model size |
Batch Size |
Average Latency [ms] |
Average Throughput [sentences/s] |
TP |
PP |
GPUs |
||
---|---|---|---|---|---|---|---|---|
A100 80GB SXM4 |
H100 80GB HBM3 |
A100 80GB SXM4 |
H100 80GB HBM3 |
|||||
Llama-2-7B | 1 | 234.0 | 153.8 | 4.3 | 6.5 | 1 | 1 | 1 |
Llama-2-7B | 2 | 249.6 | 160.8 | 8.0 | 12.4 | 1 | 1 | 1 |
Llama-2-7B | 4 | 272.6 | 174.8 | 14.7 | 22.9 | 1 | 1 | 1 |
Llama-2-7B | 8 | 329.5 | 199.2 | 24.3 | 40.2 | 1 | 1 | 1 |
Llama-2-7B | 1 | 171.7 | 128.6 | 5.8 | 7.8 | 2 | 1 | 2 |
Llama-2-7B | 2 | 180.3 | 132.0 | 11.1 | 15.1 | 2 | 1 | 2 |
Llama-2-7B | 4 | 202.9 | 137.9 | 19.7 | 29.0 | 2 | 1 | 2 |
Llama-2-7B | 8 | 237.6 | 156.2 | 33.7 | 51.2 | 2 | 1 | 2 |
Llama-2-7B | 1 | 143.7 | 107.4 | 7.0 | 9.3 | 4 | 1 | 4 |
Llama-2-7B | 2 | 149.9 | 114.0 | 13.3 | 17.5 | 4 | 1 | 4 |
Llama-2-7B | 4 | 165.2 | 120.4 | 24.2 | 33.2 | 4 | 1 | 4 |
Llama-2-7B | 8 | 196.4 | 134.6 | 40.7 | 59.5 | 4 | 1 | 4 |
Llama-2-7B | 1 | 136.5 | 97.6 | 7.3 | 10.2 | 8 | 1 | 8 |
Llama-2-7B | 2 | 143.3 | 109.1 | 14.0 | 18.3 | 8 | 1 | 8 |
Llama-2-7B | 4 | 158.6 | 116.2 | 25.2 | 34.4 | 8 | 1 | 8 |
Llama-2-7B | 8 | 181.9 | 129.1 | 44.0 | 62.0 | 8 | 1 | 8 |
Llama-2-13B | 1 | 142.5 | 86.4 | 7.0 | 11.6 | 1 | 1 | 1 |
Llama-2-13B | 2 | 163.3 | 94.2 | 12.2 | 21.2 | 1 | 1 | 1 |
Llama-2-13B | 4 | 198.5 | 117.8 | 20.2 | 34.0 | 1 | 1 | 1 |
Llama-2-13B | 8 | 282.6 | 146.7 | 28.3 | 54.5 | 1 | 1 | 1 |
Llama-2-13B | 1 | 100.4 | 69.7 | 10.0 | 14.3 | 2 | 1 | 2 |
Llama-2-13B | 2 | 112.2 | 73.7 | 17.8 | 27.1 | 2 | 1 | 2 |
Llama-2-13B | 4 | 320.0 | 88.3 | 12.5 | 45.3 | 2 | 1 | 2 |
Llama-2-13B | 8 | 188.6 | 109.8 | 42.4 | 72.8 | 2 | 1 | 2 |
Llama-2-13B | 1 | 207.8 | 61.4 | 4.8 | 16.3 | 4 | 1 | 4 |
Llama-2-13B | 2 | 84.6 | 62.0 | 23.6 | 32.3 | 4 | 1 | 4 |
Llama-2-13B | 4 | 102.3 | 72.0 | 39.1 | 55.6 | 4 | 1 | 4 |
Llama-2-13B | 8 | 143.0 | 88.6 | 56.0 | 90.3 | 4 | 1 | 4 |
Llama-2-13B | 1 | 72.2 | 54.3 | 13.9 | 18.4 | 8 | 1 | 8 |
Llama-2-13B | 2 | 76.3 | 59.3 | 26.2 | 33.7 | 8 | 1 | 8 |
Llama-2-13B | 4 | 212.0 | 157.3 | 18.9 | 25.4 | 8 | 1 | 8 |
Llama-2-13B | 8 | 242.6 | 81.8 | 33.0 | 97.8 | 8 | 1 | 8 |
Llama-2-70B | 1 | 1,108.3 | 652.7 | 0.9 | 1.5 | 2 | 1 | 2 |
Llama-2-70B | 2 | 1,156.6 | 668.2 | 1.7 | 3.0 | 2 | 1 | 2 |
Llama-2-70B | 4 | 1,272.9 | 742.1 | 3.1 | 5.4 | 2 | 1 | 2 |
Llama-2-70B | 8 | 1,520.3 | 818.2 | 5.3 | 9.8 | 2 | 1 | 2 |
Llama-2-70B | 1 | 673.4 | 433.3 | 1.5 | 2.3 | 4 | 1 | 4 |
Llama-2-70B | 2 | 715.3 | 446.9 | 2.8 | 4.5 | 4 | 1 | 4 |
Llama-2-70B | 4 | 784.8 | 487.3 | 5.1 | 8.2 | 4 | 1 | 4 |
Llama-2-70B | 8 | 941.4 | 537.3 | 8.5 | 14.9 | 4 | 1 | 4 |
Llama-2-70B | 1 | 504.1 | 343.1 | 2.0 | 2.9 | 8 | 1 | 8 |
Llama-2-70B | 2 | 542.0 | 359.8 | 3.7 | 5.6 | 8 | 1 | 8 |
Llama-2-70B | 4 | 586.6 | 386.6 | 6.8 | 10.3 | 8 | 1 | 8 |
Llama-2-70B | 8 | 695.6 | 428.2 | 11.5 | 18.7 | 8 | 1 | 8 |
Configuration 2: Translation / Style Transfer use case
- batch size: 1 - 8
input tokens length: 200
output tokens length: 200
Average Latency, Average Throughput, and Model Size
Model size |
Batch Size |
Average Latency [ms] |
Average Throughput [sentences/s] |
TP |
PP |
GPUs |
||
---|---|---|---|---|---|---|---|---|
A100 80GB SXM4 |
H100 80GB HBM3 |
A100 80GB SXM4 |
H100 80GB HBM3 |
|||||
Llama-2-7B | 1 | 2,189.6 | 1,440.0 | 0.5 | 0.7 | 1 | 1 | 1 |
Llama-2-7B | 2 | 2,227.9 | 1,463.8 | 0.9 | 1.4 | 1 | 1 | 1 |
Llama-2-7B | 4 | 2,386.5 | 1,509.7 | 1.7 | 2.7 | 1 | 1 | 1 |
Llama-2-7B | 8 | 2,611.4 | 1,653.7 | 3.1 | 4.8 | 1 | 1 | 1 |
Llama-2-7B | 1 | 1,544.2 | 1,143.2 | 0.6 | 0.9 | 2 | 1 | 2 |
Llama-2-7B | 2 | 1,588.9 | 1,163.0 | 1.3 | 1.7 | 2 | 1 | 2 |
Llama-2-7B | 4 | 1,649.4 | 1,175.1 | 2.4 | 3.4 | 2 | 1 | 2 |
Llama-2-7B | 8 | 1,841.0 | 1,238.2 | 4.3 | 6.5 | 2 | 1 | 2 |
Llama-2-7B | 1 | 1,280.0 | 923.8 | 0.8 | 1.1 | 4 | 1 | 4 |
Llama-2-7B | 2 | 1,313.0 | 991.0 | 1.5 | 2.0 | 4 | 1 | 4 |
Llama-2-7B | 4 | 1,383.5 | 1,017.2 | 2.9 | 3.9 | 4 | 1 | 4 |
Llama-2-7B | 8 | 1,463.5 | 1,070.9 | 5.5 | 7.5 | 4 | 1 | 4 |
Llama-2-7B | 1 | 1,187.4 | 827.6 | 0.8 | 1.2 | 8 | 1 | 8 |
Llama-2-7B | 2 | 1,248.4 | 936.5 | 1.6 | 2.1 | 8 | 1 | 8 |
Llama-2-7B | 4 | 1,329.7 | 975.4 | 3.0 | 4.1 | 8 | 1 | 8 |
Llama-2-7B | 8 | 1,416.6 | 1,020.7 | 5.6 | 7.8 | 8 | 1 | 8 |
Llama-2-13B | 1 | 3,884.5 | 2,396.8 | 0.3 | 0.4 | 1 | 1 | 1 |
Llama-2-13B | 2 | 4,020.7 | 2,413.6 | 0.5 | 0.8 | 1 | 1 | 1 |
Llama-2-13B | 4 | 4,250.9 | 2,559.8 | 0.9 | 1.6 | 1 | 1 | 1 |
Llama-2-13B | 8 | 4,590.2 | 2,722.8 | 1.7 | 2.9 | 1 | 1 | 1 |
Llama-2-13B | 1 | 2,499.1 | 1,717.2 | 0.4 | 0.6 | 2 | 1 | 2 |
Llama-2-13B | 2 | 2,620.4 | 1,746.2 | 0.8 | 1.1 | 2 | 1 | 2 |
Llama-2-13B | 4 | 2,699.3 | 1,778.3 | 1.5 | 2.2 | 2 | 1 | 2 |
Llama-2-13B | 8 | 2,967.1 | 1,944.8 | 2.7 | 4.1 | 2 | 1 | 2 |
Llama-2-13B | 1 | 1,894.0 | 1,431.2 | 0.5 | 0.7 | 4 | 1 | 4 |
Llama-2-13B | 2 | 1,945.1 | 1,407.0 | 1.0 | 1.4 | 4 | 1 | 4 |
Llama-2-13B | 4 | 2,047.3 | 1,451.8 | 2.0 | 2.8 | 4 | 1 | 4 |
Llama-2-13B | 8 | 2,117.4 | 1,498.0 | 3.8 | 5.3 | 4 | 1 | 4 |
Llama-2-13B | 1 | 1,692.8 | 1,201.3 | 0.6 | 0.8 | 8 | 1 | 8 |
Llama-2-13B | 2 | 1,735.4 | 1,304.1 | 1.2 | 1.5 | 8 | 1 | 8 |
Llama-2-13B | 4 | 1,836.8 | 1,361.6 | 2.2 | 2.9 | 8 | 1 | 8 |
Llama-2-13B | 8 | 1,926.9 | 1,420.0 | 4.2 | 5.6 | 8 | 1 | 8 |
Llama-2-70B | 1 | 10,500.4 | 6,267.3 | 0.1 | 0.2 | 2 | 1 | 2 |
Llama-2-70B | 2 | 10,695.1 | 6,288.4 | 0.2 | 0.3 | 2 | 1 | 2 |
Llama-2-70B | 4 | 11,151.1 | 6,401.6 | 0.4 | 0.6 | 2 | 1 | 2 |
Llama-2-70B | 8 | 11,858.6 | 6,731.0 | 0.7 | 1.2 | 2 | 1 | 2 |
Llama-2-70B | 1 | 6,403.0 | 4,115.6 | 0.2 | 0.2 | 4 | 1 | 4 |
Llama-2-70B | 2 | 6,604.8 | 4,146.6 | 0.3 | 0.5 | 4 | 1 | 4 |
Llama-2-70B | 4 | 6,833.8 | 4,241.9 | 0.6 | 0.9 | 4 | 1 | 4 |
Llama-2-70B | 8 | 7,394.9 | 4,367.1 | 1.1 | 1.8 | 4 | 1 | 4 |
Llama-2-70B | 1 | 4,734.8 | 3,202.1 | 0.2 | 0.3 | 8 | 1 | 8 |
Llama-2-70B | 2 | 4,995.7 | 3,311.5 | 0.4 | 0.6 | 8 | 1 | 8 |
Llama-2-70B | 4 | 5,110.5 | 3,379.7 | 0.8 | 1.2 | 8 | 1 | 8 |
Llama-2-70B | 8 | 5,577.7 | 3,450.4 | 1.4 | 2.3 | 8 | 1 | 8 |