Performance

DeepStream application is benchmarked across various NVIDIA TAO Toolkit and open source models. The measured performance represents end-to-end performance of the entire video analytic application considering video capture and decode, pre-processing, batching, inference, and post-processing to generate metadata. The output rendering is turned off to achieve peak inference performance. For information on disabling the output rendering, see DeepStream Reference Application - deepstream-app chapter.

To Run higher number of streams (200+) on Hopper, Ampere and Ada, follow below instructions:

$ sudo service display-manager stop
#Make sure no process is running on GPU i.e. Xorg or trition server etc
$ sudo pkill -9 Xorg
#Remove kernel modules
$ sudo rmmod nvidia_drm nvidia_modeset nvidia
#Load Modules with Regkeys
$ sudo modprobe nvidia NVreg_RegistryDwords="RMDebugOverridePerRunlistChannelRam = 1;RMIncreaseRsvdMemorySizeMB = 1024;RMDisableChIdIsolation = 0x1;RmGspFirmwareHeapSizeMB = 256"
$ sudo service display-manager start

TAO Pre-trained models

TAO toolkit has a set of pretrained models listed in the table below. If the models below satisfy your requirement, you should start with one of them. These could be used for various applications in smart city or smart places. If your application is beyond the scope of these models, you may re-train one of the popular model architecture using TAO toolkit. The table below shows the end-to-end performance on highly accurate pre-trained models from TAO toolkit. All models are available on NGC. These models are natively integrated with DeepStream and the instructions to run these models are in /opt/nvidia/deepstream/deepstream-6.3/samples/configs/tao_pretrained_models/. The following numbers are obtained with sample_1080p_h265.mp4.

Performance jetson- pretrained models

Jetson

AGX

Orin

Jetson

Orin

NX

Jetson

AGX

Xavier

Jetson

Xavier

NX

Jetson

Orin

Nano

Model Arch

Inference resolution

Precision

GPU

(FPS)

DLA1 /DLA2 (FPS)

GPU

(FPS)

DLA1/ DLA2 (FPS)

GPU (FPS)

DLA1/ DLA2 (FPS)

GPU

(FPS)

DLA1/ DLA2 (FPS)

GPU

(FPS)

PeopleNet- ResNet34

960x544

INT8

456

140

136

68

137

30

79

24

111

TrafficCamNet – ResNet18 License Plate Detection License Plate Recognition

960x544 640x480 96x48

INT8

369

NA

164

NA

155

NA

81

NA

102

TrafficCamNet – ResNet18

960x544

INT8

1107

479

524

267

487

135

288

71

398

DashCamNet – ResNet18

960x544

INT8

1108

486

509

259

463

127

275

101

379

FaceDetectIR- ResNet18

384x240

INT8

1116

556

965

482

1185

645

696

447

592

Action Recognition(3D Conv)

224x224x32

FP16

109

NA

37

NA

NA

NA

NA

NA

24

All the models in the table above can run solely on DLA. This saves valuable GPU resources to run more complex models.

Note

  • Running inference simultaneously on multiple models is not supported on the DLA. You can only run one model at a time on the DLA.

  • NA : Not available for Jetson

Performance dgpu- pretrained models

T4

A100

PCIe

A30

A2

A10

Model Arch

Inference resolution

Precision

Inference Engine

GPU (FPS)

GPU (FPS)

GPU (FPS)

GPU (FPS)

GPU (FPS)

PeopleNet- ResNet34

960x544

INT8

TRT

418

2610

1518

235

1001

PeopleNet- ResNet34

960x544

INT8

Triton

380

2335

1400

224

974

PeopleNet- ResNet34

960x544

INT8

Triton gRPC

375

2084

1292

222

898

TrafficCamNet – ResNet18 License Plate Detection License Plate recognition

960x544 640x480 96x48

INT8

TRT

451

2209

1330

294

1144

TrafficCamNet – ResNet18

960x544

INT8

TRT

1328

5298

4196

941

2448

DashCamNet – ResNet18

960x544

INT8

TRT

1363

5399

4197

873

2445

FaceDetectIR- ResNet18

384x240

INT8

TRT

2492

5548

5610

3142

3129

Action Recognition(3D Conv)

224x224x32

FP16

TRT

124

713

401

76

354

Performance dgpu- pretrained models

H100

L40

L4

Quadro (A6000)

Model Arch

Inference resolution

Precision

Inference Engine

GPU (FPS)

GPU (FPS)

GPU (FPS)

GPU (FPS)

PeopleNet- ResNet34

960x544

INT8

TRT

3565

2013

834

1458

PeopleNet- ResNet34

960x544

INT8

Triton

3224

1912

813

1352

PeopleNet- ResNet34

960x544

INT8

Triton gRPC

3017

1717

765

1268

TrafficCamNet – ResNet18 License Plate Detection License Plate recognition

960x544 640x480 96x48

INT8

TRT

2698

2230

777

1519

TrafficCamNet – ResNet18

960x544

INT8

TRT

8261

5188

2473

2932

DashCamNet – ResNet18

960x544

INT8

TRT

8228

5160

2498

2901

FaceDetectIR- ResNet18

384x240

INT8

TRT

8313

5665

5785

3319

Action Recognition(3D Conv)

224x224x32

FP16

TRT

962

827

253

496

Note

  • NA : Not available

DeepStream reference model and tracker

DeepStream SDK ships with a reference DetectNet_v2-ResNet10 model and three ResNet18 classifier models. The detailed instructions to run these models with DeepStream are provided in the next section. DeepStream provides four reference trackers: IOU, NvSORT, NvDeepSORT and NvDCF. For more information about trackers, See the Gst-nvtracker section.

Configuration File Settings for Performance Measurement

To achieve peak performance, make sure the devices are properly cooled. For Turing and Ampere GPUs, make sure you use a server that meets the thermal and airflow requirements. Along with the hardware setup, a few other options in the config file need to be set to achieve the published performance. Make the required changes to one of the config files from DeepStream SDK to replicate the peak performance.

Turn off output rendering, OSD, and tiler

OSD (on-screen display) is used to display bounding box, masks, and labels on the screen. If output rendering is disabled, creating bounding boxes is not required unless the output needs to be streamed over RTSP or saved to disk. Tiler is used to display the output in NxM tiled grid. It is not needed if rendering is disabled. Output rendering, OSD and tiler use some percentage of compute resources, so it can reduce the inference performance.

To disable OSD, tiled display and output sink, make the following changes in the DeepStream config file.

  • To disable OSD, change enable to 0

    [osd]
    enable=0
    
  • To disable tiling, change enable to 0

    [tiled-display]
    enable=0
    
  • To turn-off output rendering, change the sink to fakesink.

    [sink0]
    enable=1
    #Type - 1=FakeSink 2=EglSink 3=File
    type=1
    sync=0
    

Use the max_perf setting for tracker

DeepStream SDK 6.2 introduces a new reference low-level tracker library, NvMultiObjectTracker, along with a set of configuration files:

  • config_tracker_IOU.yml

  • config_tracker_NvDCF_max_perf.yml

  • config_tracker_NvDCF_perf.yml

  • config_tracker_NvDCF_accuracy.yml

To achieve the peak performance shown in the table above when using the NvDCF tracker, make sure the max_perf configuration is used with video frame resolution matched to that of the inference module. If the inference module uses 480x272 resolution, for example, it would be recommended to use a reduced resolution (e.g., 480x288) for the tracker module like the following:

[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_IOU.yml
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_max_perf.yml
gpu-id=0
enable-batch-process=1
display-tracking-id=1

When the IOU tracker is used, the video frame resolution doesn’t matter, and the default config_tracker_IOU.yml can be used.

To use DLA on Jetson AGX Xavier and NX for performance measurement, refer to the _DLA_inference section in the Quickstart Guide.

DeepStream reference model

Data center GPU - GA100

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - GA100.

System Configuration

The system configuration for the DeepStream SDK is listed below:

GA100 System configuration

System Configuration

Specification

CPU

AMD EPYC 7742 @ 2.25GHz 3.4GHz Turbo (Rome) HT Off

GPU

A100-PCIE-40GB(GA100) 1*40537 MiB 1*108 SM

Ubuntu

Ubuntu 20.04

GPU Driver

525.125.06

CUDA

12.1

TensorRT

8.5.3.1

GPU clock frequency

1410 MHz

Application Configuration

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

GA100 application configuration

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=187
sample_1080p_h264.mp4 (provided with the SDK) N=98

Primary GIE

  • Resnet10 (480×272)

  • Batch Size = N

  • Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.
  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

187

6.41%

40.20%

H.264

98

2.53%

18.42%

Data center GPU - T4

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - T4.

System Configuration

The system configuration for the DeepStream SDK is listed below:

T4 System configuration

System Configuration

Specification

CPU

Dual Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz (48 threads total)

GPU

Tesla T4*

System Memory

360448Mb (22x16384) DDR42666, 2400MHz

Ubuntu

Ubuntu 20.04

GPU Driver

525.125.06

CUDA

12.1

TensorRT

8.5.3.1

GPU clock frequency

1513 MHz

Application Configuration

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

T4 application configuration

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=75
sample_1080p_h264.mp4 (provided with the SDK) N=43

Primary GIE

  • Resnet10 (480×272)

  • Batch Size = N

  • Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.
  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

75

3.21%

44.77%

H.264

43

1.93%

23.27%

Data center GPU - A30

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A30.

System Configuration

The system configuration for the DeepStream SDK is listed below:

A30 System configuration

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

A30

Ubuntu

Ubuntu 20.04

GPU Driver

525.125.06

CUDA

12.1

TensorRT

8.5.3.1

GPU clock frequency

1440 MHz

Application Configuration

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A30 application configuration

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=190
sample_1080p_h264.mp4 (provided with the SDK) N=71

Primary GIE

  • Resnet10 (480×272)

  • Batch Size = N

  • Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.
  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

190

6.30%

53.86%

H.264

71

1.80%

19.31%

Data center GPU - A2

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A2.

System Configuration

The system configuration for the DeepStream SDK is listed below:

A2 System configuration

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

A2

Ubuntu

Ubuntu 20.04

GPU Driver

525.125.06

CUDA

12.1

TensorRT

8.5.3.1

GPU clock frequency

1770 MHz

Application Configuration

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A2 application configuration

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=91
sample_1080p_h264.mp4 (provided with the SDK) N=48

Primary GIE

  • Resnet10 (480×272)

  • Batch Size = N

  • Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.
  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

91

53.48%

99.99%

H.264

48

1.64%

47.93%

Data center GPU - A10

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A10.

System Configuration

The system configuration for the DeepStream SDK is listed below:

A10 System configuration

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

A10

Ubuntu

Ubuntu 20.04

GPU Driver

525.125.06

CUDA

12.1

TensorRT

8.5.3.1

GPU clock frequency

1695 MHz

Application Configuration

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A10 application configuration

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=105
sample_1080p_h264.mp4 (provided with the SDK) N=45

Primary GIE

  • Resnet10 (480×272)

  • Batch Size = N

  • Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.
  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

105

2.81%

29.60%

H.264

45

1.39%

12.62%

Data center GPU - H100

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - H100.

System Configuration

The system configuration for the DeepStream SDK is listed below:

H100 System configuration

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

H100

Ubuntu

Ubuntu 20.04

GPU Driver

525.125.06

CUDA

12.1

TensorRT

8.5.3.1

Application Configuration

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

H100 application configuration

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=274
sample_1080p_h264.mp4 (provided with the SDK) N=153

Primary GIE

  • Resnet10 (480×272)

  • Batch Size = N

  • Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.
  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

274

1.71%

65.36%

H.264

153

1.73%

26.41%

Data center GPU - L40

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L40.

System Configuration

The system configuration for the DeepStream SDK is listed below:

L40 System configuration

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

L40

Ubuntu

Ubuntu 20.04

GPU Driver

525.125.06

CUDA

12.1

TensorRT

8.5.3.1

Application Configuration

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

L40 application configuration

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=192
sample_1080p_h264.mp4 (provided with the SDK) N=82

Primary GIE

  • Resnet10 (480×272)

  • Batch Size = N

  • Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.
  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

192

4.93%

33.55%

H.264

82

1.92%

13.74%

Data center GPU - L4

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L4.

System Configuration

The system configuration for the DeepStream SDK is listed below:

L4 System configuration

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

L4

Ubuntu

Ubuntu 20.04

GPU Driver

525.125.06

CUDA

12.1

TensorRT

8.5.3.1

Application Configuration

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

L4 application configuration

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=181
sample_1080p_h264.mp4 (provided with the SDK) N=91

Primary GIE

  • Resnet10 (480×272)

  • Batch Size = N

  • Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.
  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

181

11.50%

71.68%

H.264

91

2.19%

28.96%

Data center GPU - Quadro (A6000)

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - Quadro (A6000).

System Configuration

The system configuration for the DeepStream SDK is listed below:

Quadro (A6000) System configuration

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

Quadro (A6000)

Ubuntu

Ubuntu 20.04

GPU Driver

525.125.06

CUDA

12.1

TensorRT

8.5.3.1

Application Configuration

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

Quadro (A6000) application configuration

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=108
sample_1080p_h264.mp4 (provided with the SDK) N=48

Primary GIE

  • Resnet10 (480×272)

  • Batch Size = N

  • Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.
  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

108

1.40%

26.62%

H.264

48

0.61%

11.44%

Jetson

This section describes configuration and settings for the DeepStream SDK on NVIDIA Jetson™ platforms. JetPack 5.1.2 GA is used for software installation.

System Configuration

For the performance test:

  1. Max power mode is enabled: $ sudo nvpmodel -m 0. For Jetson NX, use $ sudo nvpmodel -m 8

  2. The GPU clocks are stepped to maximum: $ sudo jetson_clocks

For information about supported power modes, see the “Supported Modes and Power Efficiency” section in the power management topics of NVIDIA Tegra Linux Driver Package Development Guide, e.g., “Power Management for Jetson AGX Xavier Devices.”

Jetson AGX Xavier

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson AGX Xavier™.

Jetson AGX Xavier Pipeline Configuration (deepstream-app)

Application Configuration

Specification

N×1080p 30 fps streams

sample_1080p_h265.mp4 (provided with the SDK) N=54
sample_1080p_h264.mp4 (provided with the SDK) N=34

Primary GIE

  • Resnet10 (480×272) Asynchronous mode enabled

  • Batch Size = N

  • Interval = 0

Tracker

Enabled; processing at 480×272 resolution, IOU tracker enabled.

3× secondary GIEs

All batches are size 32.

  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

OSD/tiled display

Disabled

Renderer

Disabled

Achieved Performance

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

54

59.07%

84.66%

H.264

34

38.89%

56.32%

Jetson NX

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson NX™.

Jetson NX Pipeline Configuration (deepstream-app)

Application Configuration

Specification

N×1080p 30 fps streams

sample_1080p_h265.mp4 (provided with the SDK) N=30
sample_1080p_h264.mp4 (provided with the SDK) N=23

Primary GIE

  • Resnet10 (480×272) Asynchronous mode enabled

  • Batch Size = N

  • Interval = 0

Tracker

Enabled; processing at 480×272 resolution, IOU tracker enabled.

3× secondary GIEs

All batches are size 32.

  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

OSD/tiled display

Disabled

Renderer

Disabled

Achieved Performance

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

30

71.59%

85.12%

H.264

23

56.22%

66.10%

Jetson AGX Orin

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson Orin™.

Jetson AGX Orin Pipeline Configuration (deepstream-app)

Application Configuration

Specification

N×1080p 30 fps streams

sample_1080p_h265.mp4 (provided with the SDK) N=37
sample_1080p_h264.mp4 (provided with the SDK) N=15

Primary GIE

  • Resnet10 (480×272) Asynchronous mode enabled

  • Batch Size = N

  • Interval = 0

Tracker

Enabled; processing at 480×272 resolution, IOU tracker enabled.

3× secondary GIEs

All batches are size 32.

  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

OSD/tiled display

Disabled

Renderer

Disabled

Achieved Performance

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

37

11.06%

23.24%

H.264

15

5.10%

10.45%

Jetson Orin NX

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson Orin NX™.

Jetson Orin NX Pipeline Configuration (deepstream-app)

Application Configuration

Specification

N×1080p 30 fps streams

sample_1080p_h265.mp4 (provided with the SDK) N=32
sample_1080p_h264.mp4 (provided with the SDK) N=13

Primary GIE

  • Resnet10 (480×272) Asynchronous mode enabled

  • Batch Size = N

  • Interval = 0

Tracker

Enabled; processing at 480×272 resolution, IOU tracker enabled.

3× secondary GIEs

All batches are size 32.

  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

OSD/tiled display

Disabled

Renderer

Disabled

Achieved Performance

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

32

14.97%

44.39%

H.264

13

7.22%

18.21%

Jetson Orin Nano

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

  • The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.

  • Change batch size under streammux and primary-gie to match the number of streams.

  • Disable tiled display and rendering using instructions above.

  • Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson Orin Nano™.

Jetson Orin Nano Pipeline Configuration (deepstream-app)

Application Configuration

Specification

N×1080p 30 fps streams

sample_1080p_h265.mp4 (provided with the SDK) N=19
sample_1080p_h264.mp4 (provided with the SDK) N=8

Primary GIE

  • Resnet10 (480×272) Asynchronous mode enabled

  • Batch Size = N

  • Interval = 0

Tracker

Enabled; processing at 480×272 resolution, IOU tracker enabled.

3× secondary GIEs

All batches are size 32.

  • Secondary_VehicleTypes (224×224—Resnet18)

  • Secondary_CarColor (224×224—Resnet18)

  • Secondary_CarMake (224×224—Resnet18)

OSD/tiled display

Disabled

Renderer

Disabled

Achieved Performance

Stream type

No. of Stream @ 30 FPS

CPU Utilization

GPU Utilization

H.265

19

17.12%

43.40%

H.264

8

8.17%

21.66%