Performance
DeepStream application is benchmarked across various NVIDIA TAO Toolkit and open source models. The measured performance represents end-to-end performance of the entire video analytic application considering video capture and decode, pre-processing, batching, inference, and post-processing to generate metadata. The output rendering is turned off to achieve peak inference performance. For information on disabling the output rendering, see DeepStream Reference Application - deepstream-app chapter.
To Run higher number of streams (200+) on Hopper, Ampere and Ada, follow below instructions:
$ sudo service display-manager stop
#Make sure no process is running on GPU i.e. Xorg or trition server etc
$ sudo pkill -9 Xorg
#Remove kernel modules
$ sudo rmmod nvidia_drm nvidia_modeset nvidia
#Load Modules with Regkeys
$ sudo modprobe nvidia NVreg_RegistryDwords="RMDebugOverridePerRunlistChannelRam = 1;RMIncreaseRsvdMemorySizeMB = 1024;RMDisableChIdIsolation = 0x1;RmGspFirmwareHeapSizeMB = 256"
$ sudo service display-manager start
TAO Pre-trained models
TAO toolkit has a set of pretrained models listed in the table below. If the models below satisfy your requirement, you should start with one of them. These could be used for various applications in smart city or smart places. If your application is beyond the scope of these models, you may re-train one of the popular model architecture using TAO toolkit. The table below shows the end-to-end performance on highly accurate pre-trained models from TAO toolkit. All models are available on NGC. These models are natively integrated with DeepStream and the instructions to run these models are in /opt/nvidia/deepstream/deepstream-6.4/samples/configs/tao_pretrained_models/
. The following numbers are obtained with sample_1080p_h265.mp4
.
Jetson
AGX
Orin
Jetson
Orin
NX
Jetson
Orin
Nano
Model Arch
Inference resolution
Precision
GPU
(FPS)
DLA1 /DLA2 (FPS)
GPU
(FPS)
DLA1/ DLA2 (FPS)
GPU
(FPS)
960x544
INT8
970
329
372
175
256
TrafficCamNet – ResNet18 License Plate Detection License Plate Recognition
960x544 640x480 96x48
INT8
370
NA
180
NA
120
960x544
INT8
1105
512
590
283
419
960x544
INT8
1107
516
574
271
406
384x240
INT8
1112
554
963
481
591
224x224x32
FP16
147
NA
51
NA
34
All the models in the table above can run solely on DLA. This saves valuable GPU resources to run more complex models.
Note
Running inference simultaneously on multiple models is not supported on the DLA. You can only run one model at a time on the DLA.
NA : Not available for Jetson
NA* : For these models DLA falls back to GPU
T4 |
A100 PCIe |
A30 |
A2 |
A10 |
||||
---|---|---|---|---|---|---|---|---|
Model Arch |
Inference resolution |
Precision |
Inference Engine |
GPU (FPS) |
GPU (FPS) |
GPU (FPS) |
GPU (FPS) |
GPU (FPS) |
960x544 |
INT8 |
TRT |
912 |
4952 |
3273 |
610 |
2059 |
|
960x544 |
INT8 |
Triton |
797 |
4214 |
2730 |
522 |
2081 |
|
960x544 |
INT8 |
Triton gRPC |
826 |
3161 |
2281 |
517 |
1929 |
|
TrafficCamNet – ResNet18 License Plate Detection License Plate recognition |
960x544 640x480 96x48 |
INT8 |
TRT |
382 |
2150 |
1327 |
253 |
1071 |
960x544 |
INT8 |
TRT |
1296 |
5292 |
4483 |
968 |
2388 |
|
960x544 |
INT8 |
TRT |
1358 |
5322 |
4391 |
903 |
2359 |
|
384x240 |
INT8 |
TRT |
2458 |
5637 |
5656 |
3141 |
3112 |
|
224x224x32 |
FP16 |
TRT |
173 |
996 |
552 |
74 |
450 |
H100 |
L40 |
L4 |
Quadro (A6000) |
A4000 |
L4000 |
||||
---|---|---|---|---|---|---|---|---|---|
Model Arch |
Inference resolution |
Precision |
Inference Engine |
GPU (FPS) |
GPU (FPS) |
GPU (FPS) |
GPU (FPS) |
GPU (FPS) |
GPU (FPS) |
960x544 |
INT8 |
TRT |
6920 |
4443 |
1674 |
2787 |
1282 |
1512 |
|
960x544 |
INT8 |
Triton |
6150 |
4080 |
1506 |
2833 |
1278 |
1362 |
|
960x544 |
INT8 |
Triton gRPC |
4822 |
3560 |
1451 |
2466 |
1284 |
1301 |
|
TrafficCamNet – ResNet18 License Plate Detection License Plate recognition |
960x544 640x480 96x48 |
INT8 |
TRT |
2801 |
2280 |
741 |
1404 |
788 |
670 |
960x544 |
INT8 |
TRT |
8259 |
5176 |
2485 |
3092 |
1433 |
2249 |
|
960x544 |
INT8 |
TRT |
8311 |
5235 |
2527 |
3071 |
1433 |
2260 |
|
384x240 |
INT8 |
TRT |
8372 |
5821 |
5775 |
3464 |
1746 |
3611 |
|
224x224x32 |
FP16 |
TRT |
1270 |
870 |
313 |
638 |
319 |
300 |
Note
NA : Not available
DeepStream reference model and tracker
DeepStream SDK ships with a reference DetectNet_v2-ResNet10 model and three ResNet18 classifier models. The detailed instructions to run these models with DeepStream are provided in the next section. DeepStream provides four reference trackers: IOU, NvSORT, NvDeepSORT and NvDCF. For more information about trackers, See the Gst-nvtracker section.
Configuration File Settings for Performance Measurement
To achieve peak performance, make sure the devices are properly cooled. For Turing and Ampere GPUs, make sure you use a server that meets the thermal and airflow requirements. Along with the hardware setup, a few other options in the config file need to be set to achieve the published performance. Make the required changes to one of the config files from DeepStream SDK to replicate the peak performance.
Turn off output rendering, OSD, and tiler
OSD (on-screen display) is used to display bounding box, masks, and labels on the screen. If output rendering is disabled, creating bounding boxes is not required unless the output needs to be streamed over RTSP or saved to disk. Tiler is used to display the output in NxM tiled grid. It is not needed if rendering is disabled. Output rendering, OSD and tiler use some percentage of compute resources, so it can reduce the inference performance.
To disable OSD, tiled display and output sink, make the following changes in the DeepStream config file.
To disable OSD, change enable to 0
[osd] enable=0To disable tiling, change enable to 0
[tiled-display] enable=0To turn-off output rendering, change the sink to fakesink.
[sink0] enable=1 #Type - 1=FakeSink 2=EglSink 3=File type=1 sync=0
Use the max_perf setting for tracker
DeepStream SDK 6.2 introduces a new reference low-level tracker library, NvMultiObjectTracker, along with a set of configuration files:
config_tracker_IOU.yml
config_tracker_NvDCF_max_perf.yml
config_tracker_NvDCF_perf.yml
config_tracker_NvDCF_accuracy.yml
To achieve the peak performance shown in the table above when using the NvDCF tracker, make sure the max_perf configuration is used with video frame resolution matched to that of the inference module. If the inference module uses 480x272 resolution, for example, it would be recommended to use a reduced resolution (e.g., 480x288) for the tracker module like the following:
[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_IOU.yml
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_max_perf.yml
gpu-id=0
enable-batch-process=1
display-tracking-id=1
When the IOU tracker is used, the video frame resolution doesn’t matter, and the default config_tracker_IOU.yml
can be used.
To use DLA on Jetson AGX Orin and Orin NX for performance measurement, refer to the Using DLA for inference section in the Quickstart Guide.
CudaDeviceScheduleBlockingSync flag is set by default on dGPU
On dGPU only, cudaDeviceScheduleBlockingSync flag is set by default on the GPU where the Deepstream pipeline runs. In general, for pipelines with multiple streams, this helps in reducing the CPU utilization without affecting the performance much.
Setting cudaDeviceScheduleBlockingSync flag when sub batches are enabled in the tracker, results in significant reduction in CPU utilization with similar or negligible dip in performance.
When the environment variable NVDS_DISABLE_CUDADEV_BLOCKINGSYNC is set to 1, cudaDeviceScheduleBlockingSync flag is not set by default.
There is a remote possibility that setting cudaDeviceScheduleBlockingSync flag might affect the pipeline peformance negatively when the pipeline already runs with GPU utilization close to 100%. Hence, when the user encounters a situation where a Deepstream pipeline is GPU bound and the GPU utilization does not reach close to 100%, then the user may experiment with setting NVDS_DISABLE_CUDADEV_BLOCKINGSYNC to 1 and check if it helps in improving the performance of the pipeline.
DeepStream reference model
Data center GPU - GA100
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - GA100.
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
AMD EPYC 7742 @ 2.25GHz 3.4GHz Turbo (Rome) HT Off
GPU
A100-PCIE-40GB(GA100) 1*40537 MiB 1*108 SM
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
GPU clock frequency
1410 MHz
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=180sample_1080p_h264.mp4
(provided with the SDK) N=93Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
180 |
11% |
74.17% |
H.264 |
93 |
2.57% |
41.63% |
Data center GPU - T4
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - T4.
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
Dual Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz (48 threads total)
GPU
Tesla T4*
System Memory
360448Mb (22x16384) DDR42666, 2400MHz
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
GPU clock frequency
1513 MHz
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=45sample_1080p_h264.mp4
(provided with the SDK) N=31Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
45 |
51.81% |
100% |
H.264 |
31 |
2.72% |
61.23% |
Data center GPU - A30
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A30.
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
AMD EPYC 7763 @2430 MHz
GPU
A30
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
GPU clock frequency
1440 MHz
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=150sample_1080p_h264.mp4
(provided with the SDK) N=98Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
150 |
41.87% |
96.9% |
H.264 |
98 |
5.62% |
61.33% |
Data center GPU - A2
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A2.
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
AMD EPYC 7763 @2430 MHz
GPU
A2
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
GPU clock frequency
1770 MHz
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=31sample_1080p_h264.mp4
(provided with the SDK) N=31Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
31 |
21.91% |
100% |
H.264 |
31 |
21.99% |
100% |
Data center GPU - A10
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A10.
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
AMD EPYC 7763 @2430 MHz
GPU
A10
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
GPU clock frequency
1695 MHz
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=79sample_1080p_h264.mp4
(provided with the SDK) N=43Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
79 |
3.26% |
65.59% |
H.264 |
43 |
1.4% |
31.18% |
Data center GPU - H100
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - H100.
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
AMD EPYC 7763 @2430 MHz
GPU
H100
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=229sample_1080p_h264.mp4
(provided with the SDK) N=148Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
229 |
2.76% |
90.1% |
H.264 |
148 |
2.6% |
42.32% |
Data center GPU - L40
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L40.
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
AMD EPYC 7763 @2430 MHz
GPU
L40
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=166sample_1080p_h264.mp4
(provided with the SDK) N=75Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
166 |
12.65% |
71.63% |
H.264 |
75 |
1.89% |
34.57% |
Data center GPU - L4
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L4.
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
AMD EPYC 7763 @2430 MHz
GPU
L4
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=81sample_1080p_h264.mp4
(provided with the SDK) N=68Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
81 |
46.1% |
100% |
H.264 |
68 |
8.06% |
75.74% |
Data center GPU - Quadro (A6000)
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - Quadro (A6000).
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
AMD EPYC 7763 @2430 MHz
GPU
Quadro (A6000)
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=101sample_1080p_h264.mp4
(provided with the SDK) N=49Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
101 |
7.05% |
60.17% |
H.264 |
49 |
2.68% |
28.57% |
Data center GPU - A4000
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - A4000.
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
AMD EPYC 7763 @2430 MHz
GPU
L4
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=49sample_1080p_h264.mp4
(provided with the SDK) N=24Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
49 |
0.97% |
49.87% |
H.264 |
24 |
0.48% |
24.56% |
Data center GPU - L4000
This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - L4000.
System Configuration
The system configuration for the DeepStream SDK is listed below:
System Configuration
Specification
CPU
AMD EPYC 7763 @2430 MHz
GPU
L4
Ubuntu
Ubuntu 22.04
GPU Driver
535.161.08
CUDA
12.2
TensorRT
8.6.1.6
Application Configuration
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IoU tracker.
The application configuration for the DeepStream SDK is listed below:
Application Configuration
Specification
N×1080p 30 fps stream
sample_1080p_h265.mp4
(provided with the SDK) N=76sample_1080p_h264.mp4
(provided with the SDK) N=45Primary GIE
resnet18_trafficcamnet.etlt
Batch Size = N
Interval=0
Tracker
Enabled. Processing at 960x544 resolution, IOU tracker enabled.
2 × Secondary GIEs
- All batches size 32. Asynchronous mode enabled.
Secondary_VehicleTypes (224×224—Resnet18)
Secondary_VehicleMake (224×224—Resnet18)
Tiled Display
Disabled
Rendering
Disabled
Achieved Performance The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration:
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
76 |
20% |
99.25% |
H.264 |
45 |
0.96% |
53.02% |
Jetson
This section describes configuration and settings for the DeepStream SDK on NVIDIA Jetson™ platforms. JetPack 6.0 GA is used for software installation.
System Configuration
For the performance test:
Max power mode is enabled:
$ sudo nvpmodel -m 0
.The GPU clocks are stepped to maximum:
$ sudo jetson_clocks
For information about supported power modes, see the “Supported Modes and Power Efficiency” section in the power management topics of NVIDIA Tegra Linux Driver Package Development Guide, e.g., “Power Management for Jetson AGX Orin Devices.”
Jetson AGX Orin
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IOU tracker.
The following tables describe performance results for the NVIDIA Jetson Orin™.
Application Configuration |
Specification |
---|---|
N×1080p 30 fps streams |
sample_1080p_h265.mp4 (provided with the SDK) N=37 sample_1080p_h264.mp4 (provided with the SDK) N=15 |
Primary GIE |
|
Tracker |
Enabled; processing at 960x544 resolution, IOU tracker enabled. |
2× secondary GIEs |
All batches are size 32. Asynchronous mode enabled.
|
OSD/tiled display |
Disabled |
Renderer |
Disabled |
Achieved Performance
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
37 |
21.25% |
82.30% |
H.264 |
15 |
9.49% |
36.42% |
Jetson Orin NX
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IOU tracker.
The following tables describe performance results for the NVIDIA Jetson Orin NX™.
Application Configuration |
Specification |
---|---|
N×1080p 30 fps streams |
sample_1080p_h265.mp4 (provided with the SDK) N=16sample_1080p_h264.mp4 (provided with the SDK) N=13 |
Primary GIE |
|
Tracker |
Enabled; processing at 960x544 resolution, IOU tracker enabled. |
2× secondary GIEs |
All batches are size 32. Asynchronous mode enabled.
|
OSD/tiled display |
Disabled |
Renderer |
Disabled |
Achieved Performance
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
16 |
19.26% |
99% |
H.264 |
13 |
15.22% |
78.52% |
Jetson Orin Nano
Config file: source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
Change the following items in the config file:
Change batch size under
streammux
andprimary-gie
to match the number of streams.Disable tiled display and rendering using instructions above.
Enable IOU tracker.
The following tables describe performance results for the NVIDIA Jetson Orin Nano™.
Application Configuration |
Specification |
---|---|
N×1080p 30 fps streams |
sample_1080p_h265.mp4 (provided with the SDK) N=13sample_1080p_h264.mp4 (provided with the SDK) N=8 |
Primary GIE |
|
Tracker |
Enabled; processing at 960x544 resolution, IOU tracker enabled. |
2× secondary GIEs |
All batches are size 32. Asynchronous mode enabled.
|
OSD/tiled display |
Disabled |
Renderer |
Disabled |
Achieved Performance
Stream type |
No. of Stream @ 30 FPS |
CPU Utilization |
GPU Utilization |
---|---|---|---|
H.265 |
13 |
20.65% |
99% |
H.264 |
8 |
12.49% |
60.15% |