Llama 3.1 Nemotron Safety Guard Multilingual 8B V1 NIM Performance#
NVIDIA used the genai-perf
tool to benchmark the performance of the microservice.
You can find more information about the tool in A Comprehensive Guide to NIM LLM Latency-Throughput Benchmarking.
The performance metrics were generated with the following configuration:
NVIDIA A100 80GB SXM
Driver 575.57.08
CUDA 12.9
BF16 precision
Tabs are organized by input and output sequence length. For example, the 500_10 tab shows the metrics for a benchmark that uses 500 tokens for an input sequence and 10 tokens for an output sequence.
Concurrency |
TTFT (ms) |
Avg Latency (ms) |
Throughput (tokens/s) |
---|---|---|---|
1 |
48.3 |
136 |
66.6 |
5 |
141 |
272 |
165 |
25 |
657 |
958 |
217 |
50 |
1215 |
1922 |
231 |
100 |
2051 |
3562 |
250 |
200 |
3716 |
6336 |
263 |
500 |
5653 |
6791 |
614 |
Concurrency |
TTFT (ms) |
Avg Latency (ms) |
Throughput (tokens/s) |
---|---|---|---|
1 |
80.3 |
5540 |
88.6 |
5 |
292 |
6093 |
404 |
25 |
1009 |
9022 |
1037 |
50 |
1995 |
12818 |
1913 |
100 |
3629 |
19002 |
2568 |
200 |
6950 |
32488 |
2996 |
500 |
28427 |
67861 |
2846 |
Concurrency |
TTFT (ms) |
Avg Latency (ms) |
Throughput (tokens/s) |
---|---|---|---|
1 |
111.5 |
11112 |
88.7 |
5 |
435 |
12413 |
394 |
25 |
1324 |
18487 |
1015 |
50 |
2836 |
26736 |
1835 |
100 |
5337 |
40156 |
2430 |
200 |
15406 |
67881 |
2382 |
500 |
71601 |
140246 |
2570 |