Performance#

DeepStream application is benchmarked across various NVIDIA TAO Toolkit and open source models. The measured performance represents end-to-end performance of the entire video analytic application considering video capture and decode, pre-processing, batching, inference, and post-processing to generate metadata. The output rendering is turned off to achieve peak inference performance. For information on disabling the output rendering, see DeepStream Reference Application - deepstream-app chapter.

TAO Pre-trained models#

TAO toolkit has a set of pretrained models listed in the table below. If the models below satisfy your requirement, you should start with one of them. These could be used for various applications in smart city or smart places. If your application is beyond the scope of these models, you may re-train one of the popular model architecture using TAO toolkit. The table below shows the end-to-end performance on highly accurate pre-trained models from TAO toolkit. All models are available on NGC. These models are natively integrated with DeepStream and the instructions to run these models are in /opt/nvidia/deepstream/deepstream/samples/configs/tao_pretrained_models/. The following numbers are obtained with sample_1080p_h265.mp4.

Performance jetson- pretrained models#

Jetson

AGX

Orin

Jetson

Orin

NX

Jetson

Orin

Nano

Model Arch

Inference resolution

Precision

Tracker

GPU

(FPS)

DLA1 /DLA2 (FPS)

GPU

(FPS)

DLA1/ DLA2 (FPS)

GPU

(FPS)

PeopleNet- ResNet34 (v2.3.4)

960x544

INT8

No Tracker

976

330

336

166

250

PeopleNet- ResNet34 (v2.3.4)

960x544

INT8

NvDCF Tracker

626

312

286

153

217

PeopleNet- ViT (deployable_v1.1)

960x544

INT8

No Tracker

48

NA

17

NA

13

PeopleNet- ViT (deployable_v1.1)

960x544

INT8

NvDCF Tracker

43

NA

17

NA

13

YOLOv8

640x640

INT8

No Tracker

750

NA

236

NA

182

YOLOv8

640x640

INT8

NvDCF Tracker

579

NA

211

NA

155

YOLOv9

640x640

FP16

No Tracker

474

NA

164

NA

122

YOLOv9

640x640

FP16

NvDCF Tracker

396

NA

140

NA

104

All the models in the table above can run solely on DLA. This saves valuable GPU resources to run more complex models.

Note

Running inference simultaneously on multiple models is not supported on the DLA. You can only run one model at a time on the DLA.
NA : Not available for Jetson
NA* : For these models DLA falls back to GPU

Performance dgpu- pretrained models#
					A100 PCIe	A30	A2	A10
Model Arch	Inference resolution	Precision	Inference Engine	Tracker	GPU (FPS)	GPU (FPS)	GPU (FPS)	GPU (FPS)
PeopleNet- ResNet34 (v2.3.4)	960x544	INT8	TRT	No Tracker	4955	3127	551	2056
PeopleNet- ResNet34 (v2.3.4)	960x544	INT8	TRT	NvDCF Tracker	4272	2672	515	1901
PeopleNet- ViT (deployable_v1.1)	960x544	INT8	TRT	No Tracker	347	167	18	148
PeopleNet- ViT (deployable_v1.1 )	960x544	INT8	TRT	NvDCF Tracker	308	165	18	180
YOLOv8s	640x640	INT8	TRT	No Tracker	2593	1879	486	1465
YOLOv8s	640x640	INT8	TRT	NvDCF Tracker	2455	1803	418	1334
YOLOv9t	640x640	FP16	TRT	No Tracker	2396	1638	339	1262
YOLOv9t	640x640	FP16	TRT	NvDCF Tracker	2215	1476	326	1121

Performance dgpu- pretrained models#
					H100	L40	L4	Quadro (A6000)	A4000	L4000
Model Arch	Inference resolution	Precision	Inference Engine	Tracker	GPU (FPS)	GPU (FPS)	GPU (FPS)	GPU (FPS)	GPU (FPS)	GPU (FPS)
PeopleNet- ResNet34 (v2.3.4)	960x544	INT8	TRT	No Tracker	6831	4571	1580	2834	1308	1361
PeopleNet- ResNet34 (v2.3.4)	960x544	INT8	TRT	NvDCF Tracker	5842	3818	1370	1459	1268	1183
PeopleNet- ViT (deployable_v1.1)	960x544	INT8	TRT	No Tracker	495	362	119	227	106	102
PeopleNet- ViT (deployable_v1.1)	960x544	INT8	TRT	NvDCF Tracker	461	349	116	209	107	73
YOLOv8s	640x640	INT8	TRT	No Tracker	2993	2213	1123	1845	1175	957
YOLOv8s	640x640	INT8	TRT	NvDCF Tracker	2826	2248	984	607	1027	844
YOLOv9t	640x640	FP16	TRT	No Tracker	2688	2260	1019	1660	939	865
YOLOv9t	640x640	FP16	TRT	NvDCF Tracker	2541	1931	893	1587	852	764

Note

NA : Not available
TBU : To Be Updated

DeepStream reference model and tracker#

DeepStream SDK ships with a reference DetectNet_v2-ResNet10 model and three ResNet18 classifier models. The detailed instructions to run these models with DeepStream are provided in the next section. DeepStream provides four reference trackers: IOU, NvSORT, NvDeepSORT and NvDCF. For more information about trackers, See the Gst-nvtracker section.

Configuration File Settings for Performance Measurement#

To achieve peak performance, make sure the devices are properly cooled. For Turing and Ampere GPUs, make sure you use a server that meets the thermal and airflow requirements. Along with the hardware setup, a few other options in the config file need to be set to achieve the published performance. Make the required changes to one of the config files from DeepStream SDK to replicate the peak performance.

Turn off output rendering, OSD, and tiler

OSD (on-screen display) is used to display bounding box, masks, and labels on the screen. If output rendering is disabled, creating bounding boxes is not required unless the output needs to be streamed over RTSP or saved to disk. Tiler is used to display the output in NxM tiled grid. It is not needed if rendering is disabled. Output rendering, OSD and tiler use some percentage of compute resources, so it can reduce the inference performance.

To disable OSD, tiled display and output sink, make the following changes in the DeepStream config file.
To disable OSD, change enable to 0
[osd]
enable=0
To disable tiling, change enable to 0
[tiled-display]
enable=0
To turn-off output rendering, change the sink to fakesink.
[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0

Use the max_perf setting for tracker

DeepStream SDK 6.2 introduces a new reference low-level tracker library, NvMultiObjectTracker, along with a set of configuration files:

config_tracker_IOU.yml

config_tracker_NvDCF_max_perf.yml

config_tracker_NvDCF_perf.yml

config_tracker_NvDCF_accuracy.yml

To achieve the peak performance shown in the table above when using the NvDCF tracker, make sure the max_perf configuration is used with video frame resolution matched to that of the inference module. If the inference module uses 480x272 resolution, for example, it would be recommended to use a reduced resolution (e.g., 480x288) for the tracker module like the following:

[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_IOU.yml
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_max_perf.yml
gpu-id=0
enable-batch-process=1
display-tracking-id=1

When the IOU tracker is used, the video frame resolution doesn’t matter, and the default config_tracker_IOU.yml can be used.

To use DLA on Jetson AGX Orin and Orin NX for performance measurement, refer to the Using DLA for inference section in the Quickstart Guide.

CudaDeviceScheduleBlockingSync flag is set by default on dGPU

On dGPU only, cudaDeviceScheduleBlockingSync flag is set by default on the GPU where the Deepstream pipeline runs. In general, for pipelines with multiple streams, this helps in reducing the CPU utilization without affecting the performance much.

Setting cudaDeviceScheduleBlockingSync flag when sub batches are enabled in the tracker, results in significant reduction in CPU utilization with similar or negligible dip in performance.

When the environment variable NVDS_DISABLE_CUDADEV_BLOCKINGSYNC is set to 1, cudaDeviceScheduleBlockingSync flag is not set by default.

There is a remote possibility that setting cudaDeviceScheduleBlockingSync flag might affect the pipeline performance negatively when the pipeline already runs with GPU utilization close to 100%. Hence, when the user encounters a situation where a Deepstream pipeline is GPU bound and the GPU utilization does not reach close to 100%, then the user may experiment with setting NVDS_DISABLE_CUDADEV_BLOCKINGSYNC to 1 and check if it helps in improving the performance of the pipeline.

DeepStream reference model#

Data center GPU - GA100#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - GA100.

System Configuration#

The system configuration for the DeepStream SDK is listed below:

GA100 System configuration#

System Configuration

Specification

CPU

AMD EPYC 7742 @ 2.25GHz 3.4GHz Turbo (Rome) HT Off

GPU

A100-PCIE-40GB(GA100) 1*40537 MiB 1*108 SM

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

GPU clock frequency

1410 MHz

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

GA100 application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=180

sample_1080p_h264.mp4 (provided with the SDK) N=93

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	180	11%	74.17%
H.264	93	2.57%	41.63%

Data center GPU - T4#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - T4.

System Configuration#

The system configuration for the DeepStream SDK is listed below:

T4 System configuration#

System Configuration

Specification

CPU

Dual Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz (48 threads total)

GPU

Tesla T4*

System Memory

360448Mb (22x16384) DDR42666, 2400MHz

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

GPU clock frequency

1513 MHz

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

T4 application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=45

sample_1080p_h264.mp4 (provided with the SDK) N=31

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	45	51.81%	100%
H.264	31	2.72%	61.23%

Data center GPU - A30#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A30.

System Configuration#

The system configuration for the DeepStream SDK is listed below:

A30 System configuration#

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

A30

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

GPU clock frequency

1440 MHz

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A30 application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=150

sample_1080p_h264.mp4 (provided with the SDK) N=98

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	150	41.87%	96.9%
H.264	98	5.62%	61.33%

Data center GPU - A2#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A2.

System Configuration#

The system configuration for the DeepStream SDK is listed below:

A2 System configuration#

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

A2

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

GPU clock frequency

1770 MHz

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A2 application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=31

sample_1080p_h264.mp4 (provided with the SDK) N=31

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	31	21.91%	100%
H.264	31	21.99%	100%

Data center GPU - A10#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A10.

System Configuration#

The system configuration for the DeepStream SDK is listed below:

A10 System configuration#

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

A10

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

GPU clock frequency

1695 MHz

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A10 application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=79

sample_1080p_h264.mp4 (provided with the SDK) N=43

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	79	3.26%	65.59%
H.264	43	1.4%	31.18%

Data center GPU - H100#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - H100.

System Configuration#

The system configuration for the DeepStream SDK is listed below:

H100 System configuration#

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

H100

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

H100 application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=229

sample_1080p_h264.mp4 (provided with the SDK) N=148

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	229	2.76%	90.1%
H.264	148	2.6%	42.32%

Data center GPU - L40#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L40.

System Configuration#

The system configuration for the DeepStream SDK is listed below:

L40 System configuration#

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

L40

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

L40 application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=166

sample_1080p_h264.mp4 (provided with the SDK) N=75

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	166	12.65%	71.63%
H.264	75	1.89%	34.57%

Data center GPU - L4#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L4.

System Configuration#

The system configuration for the DeepStream SDK is listed below:

L4 System configuration#

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

L4

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

L4 application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=81

sample_1080p_h264.mp4 (provided with the SDK) N=68

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	81	46.1%	100%
H.264	68	8.06%	75.74%

Data center GPU - Quadro (A6000)#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - Quadro (A6000).

System Configuration#

The system configuration for the DeepStream SDK is listed below:

Quadro (A6000) System configuration#

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

Quadro (A6000)

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

Quadro (A6000) application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=101

sample_1080p_h264.mp4 (provided with the SDK) N=49

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	101	7.05%	60.17%
H.264	49	2.68%	28.57%

Data center GPU - A4000#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A4000.

System Configuration#

The system configuration for the DeepStream SDK is listed below:

A4000 System configuration#

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

L4

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

A4000 application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=49

sample_1080p_h264.mp4 (provided with the SDK) N=24

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	49	0.97%	49.87%
H.264	24	0.48%	24.56%

Data center GPU - L4000#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L4000.

System Configuration#

The system configuration for the DeepStream SDK is listed below:

L4000 System configuration#

System Configuration

Specification

CPU

AMD EPYC 7763 @2430 MHz

GPU

L4

Ubuntu

Ubuntu 22.04

GPU Driver

535.161.08

CUDA

12.2

TensorRT

8.6.1.6

Application Configuration#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IoU tracker.

The application configuration for the DeepStream SDK is listed below:

L4000 application configuration#

Application Configuration

Specification

N×1080p 30 fps stream

sample_1080p_h265.mp4 (provided with the SDK) N=76

sample_1080p_h264.mp4 (provided with the SDK) N=45

Primary GIE

resnet18_trafficcamnet_pruned.onnx

Batch Size = N

Interval=0

Tracker

Enabled. Processing at 960x544 resolution, IOU tracker enabled.

2 × Secondary GIEs

All batches size 32. Asynchronous mode enabled.

Secondary_VehicleTypes (224×224—Resnet18)

Secondary_VehicleMake (224×224—Resnet18)

Tiled Display

Disabled

Rendering

Disabled

Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	76	20%	99.25%
H.264	45	0.96%	53.02%

Jetson#

This section describes configuration and settings for the DeepStream SDK on NVIDIA Jetson™ platforms. JetPack 6.0 GA is used for software installation.

System Configuration#

For the performance test:

Max power mode is enabled: $ sudo nvpmodel -m 0.
The GPU clocks are stepped to maximum: $ sudo jetson_clocks

For information about supported power modes, see the “Supported Modes and Power Efficiency” section in the power management topics of NVIDIA Tegra Linux Driver Package Development Guide, e.g., “Power Management for Jetson AGX Orin Devices.”

Jetson AGX Orin#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson Orin™.

Jetson AGX Orin Pipeline Configuration (`deepstream-app`)#
Application Configuration	Specification
N×1080p 30 fps streams	`sample_1080p_h265.mp4` (provided with the SDK) N=37 `sample_1080p_h264.mp4` (provided with the SDK) N=15
Primary GIE	resnet18_trafficcamnet_pruned.onnx Batch Size = N Interval = 0
Tracker	Enabled; processing at 960x544 resolution, IOU tracker enabled.
2× secondary GIEs	All batches are size 32. Asynchronous mode enabled. Secondary_VehicleTypes (224×224—Resnet18) Secondary_VehicleMake (224×224—Resnet18)
OSD/tiled display	Disabled
Renderer	Disabled

Achieved Performance

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	37	21.25%	82.30%
H.264	15	9.49%	36.42%

Jetson Orin NX#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson Orin NX™.

Jetson Orin NX Pipeline Configuration (`deepstream-app`)#
Application Configuration	Specification
N×1080p 30 fps streams	`sample_1080p_h265.mp4` (provided with the SDK) N=16 `sample_1080p_h264.mp4` (provided with the SDK) N=13
Primary GIE	resnet18_trafficcamnet_pruned.onnx Batch Size = N Interval = 0
Tracker	Enabled; processing at 960x544 resolution, IOU tracker enabled.
2× secondary GIEs	All batches are size 32. Asynchronous mode enabled. Secondary_VehicleTypes (224×224—Resnet18) Secondary_VehicleMake (224×224—Resnet18)
OSD/tiled display	Disabled
Renderer	Disabled

Achieved Performance

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	16	19.26%	99%
H.264	13	15.22%	78.52%

Jetson Orin Nano#

Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Change the following items in the config file:

Change batch size under streammux and primary-gie to match the number of streams.
Disable tiled display and rendering using instructions above.
Enable IOU tracker.

The following tables describe performance results for the NVIDIA Jetson Orin Nano™.

Jetson Orin Nano Pipeline Configuration (`deepstream-app`)#
Application Configuration	Specification
N×1080p 30 fps streams	`sample_1080p_h265.mp4` (provided with the SDK) N=13 `sample_1080p_h264.mp4` (provided with the SDK) N=8
Primary GIE	resnet18_trafficcamnet_pruned.onnx Batch Size = N Interval = 0
Tracker	Enabled; processing at 960x544 resolution, IOU tracker enabled.
2× secondary GIEs	All batches are size 32. Asynchronous mode enabled. Secondary_VehicleTypes (224×224—Resnet18) Secondary_VehicleMake (224×224—Resnet18)
OSD/tiled display	Disabled
Renderer	Disabled

Achieved Performance

Stream type	No. of Stream @ 30 FPS	CPU Utilization	GPU Utilization
H.265	13	20.65%	99%
H.264	8	12.49%	60.15%

System Configuration	Specification
CPU	AMD EPYC 7742 @ 2.25GHz 3.4GHz Turbo (Rome) HT Off
GPU	A100-PCIE-40GB(GA100) 140537 MiB 1108 SM
Ubuntu	Ubuntu 22.04
GPU Driver	535.161.08
CUDA	12.2
TensorRT	8.6.1.6
GPU clock frequency	1410 MHz

Application Configuration	Specification
N×1080p 30 fps stream	`sample_1080p_h265.mp4` (provided with the SDK) N=180 `sample_1080p_h264.mp4` (provided with the SDK) N=93
Primary GIE	resnet18_trafficcamnet_pruned.onnx Batch Size = N Interval=0
Tracker	Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs	All batches size 32. Asynchronous mode enabled. Secondary_VehicleTypes (224×224—Resnet18) Secondary_VehicleMake (224×224—Resnet18)
Tiled Display	Disabled
Rendering	Disabled

System Configuration	Specification
CPU	Dual Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz (48 threads total)
GPU	Tesla T4*
System Memory	360448Mb (22x16384) DDR42666, 2400MHz
Ubuntu	Ubuntu 22.04
GPU Driver	535.161.08
CUDA	12.2
TensorRT	8.6.1.6
GPU clock frequency	1513 MHz

System Configuration	Specification
CPU	AMD EPYC 7763 @2430 MHz
GPU	A30
Ubuntu	Ubuntu 22.04
GPU Driver	535.161.08
CUDA	12.2
TensorRT	8.6.1.6
GPU clock frequency	1440 MHz