Performance¶

DeepStream application is benchmarked across various NVIDIA TAO Toolkit and open source models. The measured performance represents end-to-end performance of the entire video analytic application considering video capture and decode, pre-processing, batching, inference, and post-processing to generate metadata. The output rendering is turned off to achieve peak inference performance. For information on disabling the output rendering, see DeepStream Reference Application - deepstream-app chapter.

TAO Pre-trained models¶

TAO toolkit has a set of pretrained models listed in the table below. If the models below satisfy your requirement, you should start with one of them. These could be used for various applications in smart city or smart places. If your application is beyond the scope of these models, you may re-train one of the popular model architecture using TAO toolkit. The table below shows the end-to-end performance on highly accurate pre-trained models from TAO toolkit. All models are available on NGC. These models are natively integrated with DeepStream and the instructions to run these models are in /opt/nvidia/deepstream/deepstream-6.0/samples/configs/tao_pretrained_models/. The following numbers are obtained with sample_1080p_h265.mp4.

Performance - pretrained models¶

Jetson

Nano

Jetson

Tx2

Jetson

Xavier NX

Jetson

AGX

Xavier

T4

A100

PCIe

Model Arch

Inference resolution

Precision

GPU

(FPS)

GPU

(FPS)

GPU

(FPS)

DLA1 (FPS)

DLA2 (FPS)

GPU (FPS)

DLA1 (FPS)

DLA2 (FPS)

GPU (FPS)

GPU (FPS)

PeopleNet- ResNet34

960x544

INT8

11.6

31

172

48

48

305

53

53

926

3345

TrafficCamNet – ResNet18

960x544

INT8

19.2

51

274

89

89

486

111

111

1353

3855

DashCamNet – ResNet18

960x544

INT8

17.6

46

261

91

91

460

116

116

1341

3870

FaceDetectIR- ResNet18

384x240

INT8

101

276

1126

455

455

2007

624

624

2516

5578

All the models in the table above can run solely on DLA. This saves valuable GPU resources to run more complex models.

Note

All inferences are done using INT8 precision except on Jetson Nano™. On Nano, it is FP16.
Running inference simultaneously on multiple models is not supported on the DLA. You can only run one model at a time on the DLA.

DeepStream reference model and tracker¶

DeepStream SDK ships with a reference DetectNet_v2-ResNet10 model and three ResNet18 classifier models. The detailed instructions to run these models with DeepStream are provided in the next section. DeepStream provides three reference trackers: IOU, NvDCF and DeepSORT. For more information about trackers, See the Gst-nvtracker section.

Configuration File Settings for Performance Measurement¶

To achieve peak performance, make sure the devices are properly cooled. For Turing and Ampere GPUs, make sure you use a server that meets the thermal and airflow requirements. Along with the hardware setup, a few other options in the config file need to be set to achieve the published performance. Make the required changes to one of the config files from DeepStream SDK to replicate the peak performance.

Turn off output rendering, OSD, and tiler

OSD (on-screen display) is used to display bounding box, masks, and labels on the screen. If output rendering is disabled, creating bounding boxes is not required unless the output needs to be streamed over RTSP or saved to disk. Tiler is used to display the output in NxM tiled grid. It is not needed if rendering is disabled. Output rendering, OSD and tiler use some percentage of compute resources, so it can reduce the inference performance.

To disable OSD, tiled display and output sink, make the following changes in the DeepStream config file.
To disable OSD, change enable to 0
[osd]
enable=0
To disable tiling, change enable to 0
[tiled-display]
enable=0
To turn-off output rendering, change the sink to fakesink.
[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0

Use the max_perf setting for tracker

DeepStream SDK 6.0.1 GA introduces a new reference low-level tracker library, NvMultiObjectTracker, along with a set of configuration files:

config_tracker_IOU.yml

config_tracker_NvDCF_max_perf.yml

config_tracker_NvDCF_perf.yml

config_tracker_NvDCF_accuracy.yml

To achieve the peak performance shown in the table above when using the NvDCF tracker, make sure the max_perf configuration is used with video frame resolution matched to that of the inference module. If the inference module uses 480x272 resolution, for example, it would be recommended to use a reduced resolution (e.g., 480x288) for the tracker module like the following:

[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_IOU.yml
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_max_perf.yml
gpu-id=0
enable-batch-process=1
display-tracking-id=1

When the IOU tracker is used, the video frame resolution doesn’t matter, and the default config_tracker_IOU.yml can be used.

To use DLA on Jetson AGX Xavier and Xavier NX for performance measurement, please refer to “Using DLA for inference” section in the Quickstart Guide.

DeepStream reference model¶

Data center GPU - GA100¶

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - GA100.

System Configuration¶

The system configuration for the DeepStream SDK is listed below:

GA100 System configuration¶

System Configuration

Specification

CPU

AMD EPYC 7742 @ 2.25GHz 3.4GHz Turbo (Rome) HT Off

GPU

A100-PCIE-40GB(GA100) 1*40537 MiB 1*108 SM

Ubuntu

Ubuntu 18.04

GPU Driver

470.63.01

CUDA

11.4

TensorRT

8.0.1

GPU clock frequency

1410 MHz

Application Configuration¶

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

GA100 application configuration¶

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=157

sample_1080p_h264.mp4 (provided with the SDK) N=92

Primary GIE

Resnet10 (480×272)

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_CarColor (224×224—Resnet18)

Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	157	4%	42%
H.264	92	22.28%	26.84%

Data center GPU - T4¶

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - T4.

System Configuration¶

The system configuration for the DeepStream SDK is listed below:

T4 System configuration¶

System Configuration

Specification

CPU

Dual Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz (48 threads total)

GPU

Tesla T4*

System Memory

360448Mb (22x16384) DDR42666, 2400MHz

Ubuntu

Ubuntu 18.04

GPU Driver

470.63.01

CUDA

11.4

TensorRT

8.0.1

GPU clock frequency

1513 MHz

Application Configuration¶

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

T4 application configuration¶

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=78

sample_1080p_h264.mp4 (provided with the SDK) N=39

Primary GIE

Resnet10 (480×272)

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 480×272 resolution, IOU tracker enabled.

3 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_CarColor (224×224—Resnet18)

Secondary_CarMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	78	7.3%	58%
H.264	39	10.82%	31.82%

Jetson¶

This section describes configuration and settings for the DeepStream SDK on NVIDIA Jetson™ platforms. JetPack 4.6.1 is used for software installation.

System Configuration¶

For the performance test:

Max power mode is enabled: $ sudo nvpmodel -m 0
The GPU clocks are stepped to maximum: $ sudo jetson_clocks

For information about supported power modes, see the “Supported Modes and Power Efficiency” section in the power management topics of NVIDIA Tegra Linux Driver Package Development Guide, e.g., “Power Management for Jetson AGX Xavier Devices.”

Jetson Nano¶

Config file: source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable NvDCF tracker and change the tracker resolution to 480x288.

The following tables describe performance results for the NVIDIA Jetson Nano.

Jetson Nano application configuration¶

Application Configuration

Specification

N×1080p 30 fps streams

sample_1080p_h265.mp4 (provided with the SDK) N = 9

sample_1080p_h264.mp4 (provided with the SDK) N = 8

Primary GIE

Resnet10 (480×272)

Batch Size = N

Interval = 5

Tracker

Enabled; processing at 480×288 resolution, NvDCF tracker enabled

OSD/tiled display

Disabled

Renderer

Disabled

Achieved Performance

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	9	27.14%	87.05%
H.264	8	25.67%	84.67%

Jetson AGX Xavier¶

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson AGX Xavier™.

Jetson Nano Pipeline Configuration (deepstream-app)¶

Application Configuration

Specification

N×1080p 30 fps streams

sample_1080p_h265.mp4 (provided with the SDK) N=46

sample_1080p_h264.mp4 (provided with the SDK) N=34

Primary GIE

Resnet10 (480×272) Asynchronous mode enabled

Batch Size = N

Interval = 0

Tracker

Enabled; processing at 480×272 resolution, IOU tracker enabled.

3× secondary GIEs

All batches are size 32.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_CarColor (224×224—Resnet18)

Secondary_CarMake (224×224—Resnet18)

OSD/tiled display

Disabled

Renderer

Disabled

Achieved Performance

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	46	27.81%	93.84%
H.264	34	22.58%	71.19%

Jetson NX¶

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

The inference resolution of Primary GIE is specified in the samples/models/Primary_detector/resnet10.prototxt. In this file, change the dim (i.e. height and width of input tensor) from 368x640 to 272x480.
Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson NX™.

Jetson NX Pipeline Configuration (`deepstream-app`)¶
Application Configuration	Specification
N×1080p 30 fps streams	`sample_1080p_h265.mp4` (provided with the SDK) N=23 `sample_1080p_h264.mp4` (provided with the SDK) N=19
Primary GIE	Resnet10 (480×272) Asynchronous mode enabled Batch Size = N Interval = 0
Tracker	Enabled; processing at 480×272 resolution, IOU tracker enabled.
3× secondary GIEs	All batches are size 32. Secondary_VehicleTypes (224×224—Resnet18) Secondary_CarColor (224×224—Resnet18) Secondary_CarMake (224×224—Resnet18)
OSD/tiled display	Disabled
Renderer	Disabled

Achieved Performance

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	23	31.12%	96.11%
H.264	19	29.4%	76.94%

Jetson TX2¶

Config file: source12_1080p_dec_infer-resnet_tracker_tiled_display_fp16_tx2.txt

Change the following in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable NvDCF tracker and change the tracker resolution to 480x288.

The following tables describe performance results for the Jetson™ TX2.

Jetson TX2 Pipeline Configuration (`deepstream-app`)¶
Application Configuration	Specification
N×1080p 30 fps streams	`sample_1080p_h265.mp4` (provided with the SDK) N=21 `sample_1080p_h264.mp4` (provided with the SDK) N=17
Primary GIE	Resnet10 (480×272) Batch Size = N Interval = 5
Tracker	Enabled; processing at 480×288 resolution, NvDCF tracker enabled
OSD/tiled display	Disabled
Renderer	Disabled

Achieved Performance

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	21	24.55%	80.12%
H.264	17	20.69%	67.59%

Jetson TX1¶

Config file: source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_tx1.txt

Change the following in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable NvDCF tracker and change the tracker resolution to 480x288.

The following tables describe performance results for the Jetson™ TX1.

Jetson TX1 Pipeline Configuration (`deepstream-app`)¶
Application Configuration	Specification
N×1080p 30 fps streams	`sample_1080p_h265.mp4` (provided with the SDK) N=13 `sample_1080p_h264.mp4` (provided with the SDK) N=10
Primary GIE	Resnet10 (480×272) Batch Size = N Interval = 5
Tracker	Enabled; processing at 480×288 resolution, NvDCF tracker enabled
OSD/tiled display	Disabled
Renderer	Disabled

Achieved Performance

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	13	33.74%	84.5%
H.264	10	26.35%	64.91%