Llama 3.1 Nemotron Safety Guard 8B NIM Performance#

NVIDIA used the genai-perf tool to benchmark the performance of the microservice. You can find more information about the tool in A Comprehensive Guide to NIM LLM Latency-Throughput Benchmarking.

Note

Some TensorRT-LLM profiles can experience lower throughput under high load compared to generic model profiles that use the vLLM engine. Refer to Known Issues for information about the affected GPU models and model profiles.

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

15.6545895

165.8306125

100

178.7885433

7590.530806

250

982.997063

7152.400699

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

593.9327107

132.4726447

100

163310.0012

763.8654089

250

454748.5627

805.5314916

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

13.5613847

164.7016311

100

192.6044143

7568.917439

250

2753.362218

6630.793525

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

21.3878205

155.7813191

100

1702.816789

3016.892643

250

7297.714699

4088.630242

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

13.8477973

236.2624885

100

249.0825608

8232.655092

250

8106.942208

7396.622128

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

341.6370894

201.1177031

100

51148.66617

1614.065726

250

157557.9661

1980.462341

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

12.1833057

234.527976

100

482.8972301

7721.103088

250

12225.75274

6795.367064

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

19.7319282

229.341217

100

1093.889076

5009.582525

250

764.1520131

7150.204643

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

22.7511721

290.4619198

100

249.1638954

6614.677925

250

13345.33812

5542.077368

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

184.5164652

249.4264918

100

2725.157671

3628.414993

250

16732.21006

4007.624481

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

18.6863878

289.4576394

100

477.7794938

6426.299474

250

26145.86429

5838.488146

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

22.694894

268.5684367

100

742.5481941

6008.20484

250

8567.845763

4714.935455

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

17.4770097

301.2449833

100

793.5024175

5542.199495

250

18906.84536

5608.381461

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

139.8522205

268.1804005

100

2106.931377

5160.006214

250

3588.933858

4425.731269

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

19.0400126

301.7000809

100

1073.416793

5880.862967

250

26214.59496

5797.166315

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

19.3430501

284.3361402

100

585.6628827

6951.932323

250

10251.53137

5965.680189

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

17.34858

250.0306794

100

664.0781269

5622.077664

250

15581.11085

5807.193842

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

321.7746258

207.3235571

100

16359.24696

2131.632291

250

85345.88784

2470.391422

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

20.8522016

244.3645439

100

437.8395997

5870.634227

250

16537.37561

5558.79099

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

20.8431517

238.8689469

100

1310.053187

5248.337379

250

2042.815603

4562.93301

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

19.3315999

340.5156469

100

169.6391558

7769.686042

250

10212.12246

7018.9646

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

225.9107516

295.5853835

100

3300.952902

3651.796948

250

3597.747918

4959.736042

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

16.0753613

335.1875393

100

196.0616131

7306.826508

250

19418.5877

6402.644384

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

18.7042257

330.4657613

100

890.8152144

6649.493977

250

4749.135821

6963.321731

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

15.488483

309.8071233

100

383.3940188

9386.526493

250

6636.969327

8299.543166

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

190.3753446

261.5278999

100

2762.170605

3650.551529

250

16738.56882

4416.482628

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

18.5821885

308.6919203

100

854.5007959

8527.14357

250

8183.213898

7687.718459

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

19.4822661

289.058214

100

801.0053271

7413.132203

250

3567.297511

7112.61126

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

17.8515908

287.9325127

100

725.7802599

9483.001136

250

6876.906131

8527.774856

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

145.1843638

257.9122399

100

2111.238508

5864.411942

250

7271.828313

6335.577469

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

16.7808959

291.5762973

100

630.0213229

8479.468982

250

11407.94757

7824.821327

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

20.6509091

277.1043267

100

610.723633

8160.29143

250

4036.741818

7578.288911

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

30.6239534

231.5447081

100

1026.587987

4734.589519

250

9279.70032

3444.055733

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

380.2782251

186.5537137

100

24124.51174

1575.020099

250

109055.8817

2021.823513

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

24.2338224

231.7511857

100

1126.30502

5384.5475

250

20339.90493

3799.946465

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

26.6594707

221.4366624

100

1528.796516

4263.476537

250

3961.807659

4162.024676

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

21.6234663

308.4842109

100

941.1811134

6630.379955

250

13648.67561

4748.452742

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

267.0568852

258.5046683

100

4420.057777

3369.875826

250

5281.827431

3024.796314

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

34.3629598

300.4972324

100

1931.667121

6070.254988

250

25670.12949

4619.601269

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

25.1463247

300.0440224

100

1585.653399

4817.11824

250

10208.6545

3930.694112

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

21.8740067

194.6338687

100

359.7905858

8613.112535

250

4177.360029

6840.552687

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

369.9057861

165.208596

100

23370.15202

1657.119314

250

99673.58774

2207.50523

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

21.3257452

192.0772661

100

489.696887

6453.298762

250

5386.3109

6597.558542

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

25.2491809

187.6106876

100

1329.641285

5007.470957

250

2090.66732

6845.862102

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

17.8311877

350.0587465

100

589.2003054

9056.710656

250

9024.624532

7724.907669

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

232.4134406

301.480574

100

3314.288603

3667.363492

250

2342.41047

5173.84704

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

19.2751396

344.5727621

100

714.0649047

8640.266702

250

9526.37373

7359.226721

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

21.3276246

338.4071101

100

1020.492537

6555.352081

250

5114.626249

6592.498288

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

13.9373806

214.4881363

100

282.76972

7223.175018

250

5171.347234

5917.030188

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

379.5539616

177.8862616

100

78589.89088

1260.850576

250

231234.3621

1421.552374

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

12.6173299

214.7544097

100

352.2493184

7334.561637

250

14098.96694

5928.436627

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

21.434416

204.7933438

100

1199.544142

4450.826852

250

1035.442222

5487.789918

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

13.1815457

303.9892503

100

1098.294285

7534.267087

250

5305.827633

5965.815079

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

246.3304811

263.763722

100

13639.90378

2434.563964

250

61280.99635

3346.231621

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

12.4615099

296.328391

100

1091.830535

6583.947155

250

11869.71198

5896.047103

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

19.4696787

293.4056187

100

914.1082314

6201.263137

250

6368.287733

6043.192757

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

15.8508156

221.4729669

100

205.4578216

9055.124211

250

3191.344

7294.44983

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

539.6980045

168.8321413

100

79774.92233

1075.780957

250

253708.4832

1228.352792

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

14.929404

219.9559647

100

191.9091008

8927.996951

250

2887.036268

6739.691038

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

25.1167806

210.9385502

100

1724.378724

3717.068064

250

727.5800222

5486.18123

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

17.0849956

302.4334352

100

361.1513149

9338.991282

250

5213.474047

7987.527882

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

482.0947651

257.4668354

100

24289.92169

2100.13367

250

25727.68444

3276.366944

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

14.0711775

300.5497791

100

343.903743

9065.327292

250

5171.40004

7148.158411

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

25.5055612

295.5423923

100

3212.073949

5072.025157

250

3568.199541

7700.407577

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

40.9950723

115.4078477

100

479.1837433

5008.259539

250

1917.479921

4406.781205

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

1117.272409

94.41492147

100

288429.9953

474.0749274

250

770625.0422

494.3553321

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

30.8507387

109.3008642

100

250.0695608

5054.84662

250

36731.51378

4600.587425

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

33.5472711

110.9781635

100

12235.87424

1244.970537

250

36576.28183

1812.556458

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

18.346242

147.7505127

100

193.616571

5670.698478

250

1472.41785

4545.918249

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

1141.287771

115.5365574

100

156339.011

632.812149

250

477114.7221

704.0269845

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

16.6655065

147.5776455

100

277.5501292

5487.363601

250

7699.541729

4279.513421

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

26.2753277

140.8825775

100

3011.62457

2279.213265

250

752.3530086

3610.464051

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

70.1936117

138.6855148

100

787.7898255

3258.396654

250

264.3988796

4450.522725

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

1779.270521

109.8893433

100

248842.0319

471.0145233

250

689711.2583

519.4243328

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

21.9969315

139.1156202

100

95.66431102

3361.082389

250

265.0402203

4375.726201

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

36.1017799

134.4942996

100

6213.811518

1475.641078

250

627.6555403

2711.690116

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

48.5861763

82.98485305

100

851.6268876

2626.44124

250

1429.625423

3784.380956

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

1838.920046

65.8556057

100

481709.7334

270.5867778

250

1294908.224

288.8582086

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

19.7706783

83.65236195

100

278.4867385

2704.67425

250

19591.67081

3384.499841

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

33.0240611

80.45214875

100

16469.54546

680.9105263

250

47279.21182

1161.488559

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

17.2760664

111.4152077

100

307.4797944

4301.806109

250

1372.971999

5435.719248

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

1321.959859

58.37161964

100

183375.0242

524.5310389

250

558141.3416

594.6159762

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

16.3127808

108.3558846

100

172.1732453

4350.827544

250

1312.802928

5444.319001

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

24.9359836

92.59825503

100

5590.739427

1635.38221

250

1947.168312

2970.58068

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

22.2433557

136.4111963

100

310.7136493

5415.7268

250

1873.663191

5594.150438

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

1129.624434

112.7406744

100

140104.993

607.2613134

250

437826.776

713.4995428

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

18.3535444

138.0106839

100

219.0899802

5538.906016

250

2249.620364

5714.927357

Concurrency

TTFT (ms)

Throughput (inputs/s)

1

32.3353681

131.8131112

100

4221.889746

2002.485231

250

2005.005962

3291.051226