Llama 3.1 Nemotron Safety Guard 8B NIM Performance#
NVIDIA used the genai-perf tool to benchmark the performance of the microservice.
You can find more information about the tool in A Comprehensive Guide to NIM LLM Latency-Throughput Benchmarking.
Note
Some TensorRT-LLM profiles can experience lower throughput under high load compared to generic model profiles that use the vLLM engine. Refer to Known Issues for information about the affected GPU models and model profiles.
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
15.6545895  | 
165.8306125  | 
100  | 
178.7885433  | 
7590.530806  | 
250  | 
982.997063  | 
7152.400699  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
593.9327107  | 
132.4726447  | 
100  | 
163310.0012  | 
763.8654089  | 
250  | 
454748.5627  | 
805.5314916  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
13.5613847  | 
164.7016311  | 
100  | 
192.6044143  | 
7568.917439  | 
250  | 
2753.362218  | 
6630.793525  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
21.3878205  | 
155.7813191  | 
100  | 
1702.816789  | 
3016.892643  | 
250  | 
7297.714699  | 
4088.630242  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
13.8477973  | 
236.2624885  | 
100  | 
249.0825608  | 
8232.655092  | 
250  | 
8106.942208  | 
7396.622128  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
341.6370894  | 
201.1177031  | 
100  | 
51148.66617  | 
1614.065726  | 
250  | 
157557.9661  | 
1980.462341  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
12.1833057  | 
234.527976  | 
100  | 
482.8972301  | 
7721.103088  | 
250  | 
12225.75274  | 
6795.367064  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
19.7319282  | 
229.341217  | 
100  | 
1093.889076  | 
5009.582525  | 
250  | 
764.1520131  | 
7150.204643  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
22.7511721  | 
290.4619198  | 
100  | 
249.1638954  | 
6614.677925  | 
250  | 
13345.33812  | 
5542.077368  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
184.5164652  | 
249.4264918  | 
100  | 
2725.157671  | 
3628.414993  | 
250  | 
16732.21006  | 
4007.624481  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
18.6863878  | 
289.4576394  | 
100  | 
477.7794938  | 
6426.299474  | 
250  | 
26145.86429  | 
5838.488146  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
22.694894  | 
268.5684367  | 
100  | 
742.5481941  | 
6008.20484  | 
250  | 
8567.845763  | 
4714.935455  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
17.4770097  | 
301.2449833  | 
100  | 
793.5024175  | 
5542.199495  | 
250  | 
18906.84536  | 
5608.381461  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
139.8522205  | 
268.1804005  | 
100  | 
2106.931377  | 
5160.006214  | 
250  | 
3588.933858  | 
4425.731269  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
19.0400126  | 
301.7000809  | 
100  | 
1073.416793  | 
5880.862967  | 
250  | 
26214.59496  | 
5797.166315  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
19.3430501  | 
284.3361402  | 
100  | 
585.6628827  | 
6951.932323  | 
250  | 
10251.53137  | 
5965.680189  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
17.34858  | 
250.0306794  | 
100  | 
664.0781269  | 
5622.077664  | 
250  | 
15581.11085  | 
5807.193842  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
321.7746258  | 
207.3235571  | 
100  | 
16359.24696  | 
2131.632291  | 
250  | 
85345.88784  | 
2470.391422  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
20.8522016  | 
244.3645439  | 
100  | 
437.8395997  | 
5870.634227  | 
250  | 
16537.37561  | 
5558.79099  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
20.8431517  | 
238.8689469  | 
100  | 
1310.053187  | 
5248.337379  | 
250  | 
2042.815603  | 
4562.93301  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
19.3315999  | 
340.5156469  | 
100  | 
169.6391558  | 
7769.686042  | 
250  | 
10212.12246  | 
7018.9646  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
225.9107516  | 
295.5853835  | 
100  | 
3300.952902  | 
3651.796948  | 
250  | 
3597.747918  | 
4959.736042  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
16.0753613  | 
335.1875393  | 
100  | 
196.0616131  | 
7306.826508  | 
250  | 
19418.5877  | 
6402.644384  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
18.7042257  | 
330.4657613  | 
100  | 
890.8152144  | 
6649.493977  | 
250  | 
4749.135821  | 
6963.321731  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
15.488483  | 
309.8071233  | 
100  | 
383.3940188  | 
9386.526493  | 
250  | 
6636.969327  | 
8299.543166  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
190.3753446  | 
261.5278999  | 
100  | 
2762.170605  | 
3650.551529  | 
250  | 
16738.56882  | 
4416.482628  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
18.5821885  | 
308.6919203  | 
100  | 
854.5007959  | 
8527.14357  | 
250  | 
8183.213898  | 
7687.718459  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
19.4822661  | 
289.058214  | 
100  | 
801.0053271  | 
7413.132203  | 
250  | 
3567.297511  | 
7112.61126  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
17.8515908  | 
287.9325127  | 
100  | 
725.7802599  | 
9483.001136  | 
250  | 
6876.906131  | 
8527.774856  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
145.1843638  | 
257.9122399  | 
100  | 
2111.238508  | 
5864.411942  | 
250  | 
7271.828313  | 
6335.577469  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
16.7808959  | 
291.5762973  | 
100  | 
630.0213229  | 
8479.468982  | 
250  | 
11407.94757  | 
7824.821327  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
20.6509091  | 
277.1043267  | 
100  | 
610.723633  | 
8160.29143  | 
250  | 
4036.741818  | 
7578.288911  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
30.6239534  | 
231.5447081  | 
100  | 
1026.587987  | 
4734.589519  | 
250  | 
9279.70032  | 
3444.055733  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
380.2782251  | 
186.5537137  | 
100  | 
24124.51174  | 
1575.020099  | 
250  | 
109055.8817  | 
2021.823513  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
24.2338224  | 
231.7511857  | 
100  | 
1126.30502  | 
5384.5475  | 
250  | 
20339.90493  | 
3799.946465  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
26.6594707  | 
221.4366624  | 
100  | 
1528.796516  | 
4263.476537  | 
250  | 
3961.807659  | 
4162.024676  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
21.6234663  | 
308.4842109  | 
100  | 
941.1811134  | 
6630.379955  | 
250  | 
13648.67561  | 
4748.452742  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
267.0568852  | 
258.5046683  | 
100  | 
4420.057777  | 
3369.875826  | 
250  | 
5281.827431  | 
3024.796314  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
34.3629598  | 
300.4972324  | 
100  | 
1931.667121  | 
6070.254988  | 
250  | 
25670.12949  | 
4619.601269  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
25.1463247  | 
300.0440224  | 
100  | 
1585.653399  | 
4817.11824  | 
250  | 
10208.6545  | 
3930.694112  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
21.8740067  | 
194.6338687  | 
100  | 
359.7905858  | 
8613.112535  | 
250  | 
4177.360029  | 
6840.552687  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
369.9057861  | 
165.208596  | 
100  | 
23370.15202  | 
1657.119314  | 
250  | 
99673.58774  | 
2207.50523  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
21.3257452  | 
192.0772661  | 
100  | 
489.696887  | 
6453.298762  | 
250  | 
5386.3109  | 
6597.558542  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
25.2491809  | 
187.6106876  | 
100  | 
1329.641285  | 
5007.470957  | 
250  | 
2090.66732  | 
6845.862102  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
17.8311877  | 
350.0587465  | 
100  | 
589.2003054  | 
9056.710656  | 
250  | 
9024.624532  | 
7724.907669  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
232.4134406  | 
301.480574  | 
100  | 
3314.288603  | 
3667.363492  | 
250  | 
2342.41047  | 
5173.84704  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
19.2751396  | 
344.5727621  | 
100  | 
714.0649047  | 
8640.266702  | 
250  | 
9526.37373  | 
7359.226721  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
21.3276246  | 
338.4071101  | 
100  | 
1020.492537  | 
6555.352081  | 
250  | 
5114.626249  | 
6592.498288  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
13.9373806  | 
214.4881363  | 
100  | 
282.76972  | 
7223.175018  | 
250  | 
5171.347234  | 
5917.030188  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
379.5539616  | 
177.8862616  | 
100  | 
78589.89088  | 
1260.850576  | 
250  | 
231234.3621  | 
1421.552374  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
12.6173299  | 
214.7544097  | 
100  | 
352.2493184  | 
7334.561637  | 
250  | 
14098.96694  | 
5928.436627  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
21.434416  | 
204.7933438  | 
100  | 
1199.544142  | 
4450.826852  | 
250  | 
1035.442222  | 
5487.789918  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
13.1815457  | 
303.9892503  | 
100  | 
1098.294285  | 
7534.267087  | 
250  | 
5305.827633  | 
5965.815079  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
246.3304811  | 
263.763722  | 
100  | 
13639.90378  | 
2434.563964  | 
250  | 
61280.99635  | 
3346.231621  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
12.4615099  | 
296.328391  | 
100  | 
1091.830535  | 
6583.947155  | 
250  | 
11869.71198  | 
5896.047103  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
19.4696787  | 
293.4056187  | 
100  | 
914.1082314  | 
6201.263137  | 
250  | 
6368.287733  | 
6043.192757  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
15.8508156  | 
221.4729669  | 
100  | 
205.4578216  | 
9055.124211  | 
250  | 
3191.344  | 
7294.44983  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
539.6980045  | 
168.8321413  | 
100  | 
79774.92233  | 
1075.780957  | 
250  | 
253708.4832  | 
1228.352792  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
14.929404  | 
219.9559647  | 
100  | 
191.9091008  | 
8927.996951  | 
250  | 
2887.036268  | 
6739.691038  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
25.1167806  | 
210.9385502  | 
100  | 
1724.378724  | 
3717.068064  | 
250  | 
727.5800222  | 
5486.18123  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
17.0849956  | 
302.4334352  | 
100  | 
361.1513149  | 
9338.991282  | 
250  | 
5213.474047  | 
7987.527882  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
482.0947651  | 
257.4668354  | 
100  | 
24289.92169  | 
2100.13367  | 
250  | 
25727.68444  | 
3276.366944  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
14.0711775  | 
300.5497791  | 
100  | 
343.903743  | 
9065.327292  | 
250  | 
5171.40004  | 
7148.158411  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
25.5055612  | 
295.5423923  | 
100  | 
3212.073949  | 
5072.025157  | 
250  | 
3568.199541  | 
7700.407577  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
40.9950723  | 
115.4078477  | 
100  | 
479.1837433  | 
5008.259539  | 
250  | 
1917.479921  | 
4406.781205  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
1117.272409  | 
94.41492147  | 
100  | 
288429.9953  | 
474.0749274  | 
250  | 
770625.0422  | 
494.3553321  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
30.8507387  | 
109.3008642  | 
100  | 
250.0695608  | 
5054.84662  | 
250  | 
36731.51378  | 
4600.587425  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
33.5472711  | 
110.9781635  | 
100  | 
12235.87424  | 
1244.970537  | 
250  | 
36576.28183  | 
1812.556458  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
18.346242  | 
147.7505127  | 
100  | 
193.616571  | 
5670.698478  | 
250  | 
1472.41785  | 
4545.918249  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
1141.287771  | 
115.5365574  | 
100  | 
156339.011  | 
632.812149  | 
250  | 
477114.7221  | 
704.0269845  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
16.6655065  | 
147.5776455  | 
100  | 
277.5501292  | 
5487.363601  | 
250  | 
7699.541729  | 
4279.513421  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
26.2753277  | 
140.8825775  | 
100  | 
3011.62457  | 
2279.213265  | 
250  | 
752.3530086  | 
3610.464051  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
70.1936117  | 
138.6855148  | 
100  | 
787.7898255  | 
3258.396654  | 
250  | 
264.3988796  | 
4450.522725  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
1779.270521  | 
109.8893433  | 
100  | 
248842.0319  | 
471.0145233  | 
250  | 
689711.2583  | 
519.4243328  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
21.9969315  | 
139.1156202  | 
100  | 
95.66431102  | 
3361.082389  | 
250  | 
265.0402203  | 
4375.726201  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
36.1017799  | 
134.4942996  | 
100  | 
6213.811518  | 
1475.641078  | 
250  | 
627.6555403  | 
2711.690116  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
48.5861763  | 
82.98485305  | 
100  | 
851.6268876  | 
2626.44124  | 
250  | 
1429.625423  | 
3784.380956  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
1838.920046  | 
65.8556057  | 
100  | 
481709.7334  | 
270.5867778  | 
250  | 
1294908.224  | 
288.8582086  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
19.7706783  | 
83.65236195  | 
100  | 
278.4867385  | 
2704.67425  | 
250  | 
19591.67081  | 
3384.499841  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
33.0240611  | 
80.45214875  | 
100  | 
16469.54546  | 
680.9105263  | 
250  | 
47279.21182  | 
1161.488559  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
17.2760664  | 
111.4152077  | 
100  | 
307.4797944  | 
4301.806109  | 
250  | 
1372.971999  | 
5435.719248  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
1321.959859  | 
58.37161964  | 
100  | 
183375.0242  | 
524.5310389  | 
250  | 
558141.3416  | 
594.6159762  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
16.3127808  | 
108.3558846  | 
100  | 
172.1732453  | 
4350.827544  | 
250  | 
1312.802928  | 
5444.319001  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
24.9359836  | 
92.59825503  | 
100  | 
5590.739427  | 
1635.38221  | 
250  | 
1947.168312  | 
2970.58068  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
22.2433557  | 
136.4111963  | 
100  | 
310.7136493  | 
5415.7268  | 
250  | 
1873.663191  | 
5594.150438  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
1129.624434  | 
112.7406744  | 
100  | 
140104.993  | 
607.2613134  | 
250  | 
437826.776  | 
713.4995428  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
18.3535444  | 
138.0106839  | 
100  | 
219.0899802  | 
5538.906016  | 
250  | 
2249.620364  | 
5714.927357  | 
Concurrency  | 
TTFT (ms)  | 
Throughput (inputs/s)  | 
|---|---|---|
1  | 
32.3353681  | 
131.8131112  | 
100  | 
4221.889746  | 
2002.485231  | 
250  | 
2005.005962  | 
3291.051226  |