Performance ============= DeepStream application is benchmarked across various NVIDIA TAO Toolkit and open source models. The measured performance represents end-to-end performance of the entire video analytic application considering video capture and decode, pre-processing, batching, inference, and post-processing to generate metadata. The output rendering is turned off to achieve peak inference performance. For information on disabling the output rendering, see :doc:`DS_ref_app_deepstream` chapter. TAO Pre-trained models -------------------------- `TAO toolkit `_ has a set of pretrained models listed in the table below. If the models below satisfy your requirement, you should start with one of them. These could be used for various applications in smart city or smart places. If your application is beyond the scope of these models, you may re-train one of the popular model architecture using TAO toolkit. The table below shows the end-to-end performance on highly accurate pre-trained models from TAO toolkit. All models are available on NGC. These models are natively integrated with DeepStream and the instructions to run these models are in ``/opt/nvidia/deepstream/deepstream-6.1/samples/configs/tao_pretrained_models/``. The following numbers are obtained with ``sample_1080p_h265.mp4``. .. csv-table:: Performance jetson- pretrained models :file: ../text/tables/DS_performance_TLT_pretrained_jetson.csv :widths: 12, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7 :header-rows: 1 All the models in the table above can run solely on DLA. This saves valuable GPU resources to run more complex models. .. note:: * Running inference simultaneously on multiple models is not supported on the DLA. You can only run one model at a time on the DLA. * NA : Not available for Jetson developer preview release .. csv-table:: Performance dgpu- pretrained models :file: ../text/tables/DS_performance_TLT_pretrained_dgpu.csv :widths: 15, 7, 7, 7, 7, 7, 7, 7 :header-rows: 1 .. note:: * NA : Not available DeepStream reference model and tracker --------------------------------------- DeepStream SDK ships with a reference `DetectNet_v2-ResNet10` model and three `ResNet18` classifier models. The detailed instructions to run these models with DeepStream are provided in the next section. DeepStream provides three reference trackers: `IOU`, `NvDCF` and `DeepSORT`. For more information about trackers, See the :doc:`DS_plugin_gst-nvtracker` section. Configuration File Settings for Performance Measurement -------------------------------------------------------- To achieve peak performance, make sure the devices are properly cooled. For Turing and Ampere GPUs, make sure you use a server that meets the thermal and airflow requirements. Along with the hardware setup, a few other options in the config file need to be set to achieve the published performance. Make the required changes to one of the config files from DeepStream SDK to replicate the peak performance. **Turn off output rendering, OSD, and tiler** OSD (on-screen display) is used to display bounding box, masks, and labels on the screen. If output rendering is disabled, creating bounding boxes is not required unless the output needs to be streamed over RTSP or saved to disk. Tiler is used to display the output in `NxM` tiled grid. It is not needed if rendering is disabled. Output rendering, OSD and tiler use some percentage of compute resources, so it can reduce the inference performance. To disable OSD, tiled display and output sink, make the following changes in the DeepStream config file. * To disable OSD, change enable to 0 :: [osd] enable=0 * To disable tiling, change enable to 0 :: [tiled-display] enable=0 * To turn-off output rendering, change the sink to fakesink. :: [sink0] enable=1 #Type - 1=FakeSink 2=EglSink 3=File type=1 sync=0 **Use the max_perf setting for tracker** DeepStream SDK 6.1 introduces a new reference low-level tracker library, *NvMultiObjectTracker*, along with a set of configuration files: * ``config_tracker_IOU.yml`` * ``config_tracker_NvDCF_max_perf.yml`` * ``config_tracker_NvDCF_perf.yml`` * ``config_tracker_NvDCF_accuracy.yml`` To achieve the peak performance shown in the table above when using the NvDCF tracker, make sure the max_perf configuration is used with video frame resolution matched to that of the inference module. If the inference module uses 480x272 resolution, for example, it would be recommended to use a reduced resolution (e.g., 480x288) for the tracker module like the following: :: [tracker] enable=1 tracker-width=480 tracker-height=288 ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so #ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_IOU.yml ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_max_perf.yml gpu-id=0 enable-batch-process=1 display-tracking-id=1 When the IOU tracker is used, the video frame resolution doesn't matter, and the default ``config_tracker_IOU.yml`` can be used. To use DLA on Jetson AGX Xavier and Xavier NX for performance measurement, please refer to "Using DLA for inference" section in the Quickstart Guide. DeepStream reference model ---------------------------- Data center GPU - GA100 ~~~~~~~~~~~~~~~~~~~~~~~ This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - GA100. System Configuration ^^^^^^^^^^^^^^^^^^^^^^ The system configuration for the DeepStream SDK is listed below: .. csv-table:: GA100 System configuration :file: ../text/tables/DS_performance_Ampere_system_configuration.csv :widths: 30, 40 :header-rows: 1 Application Configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Config file**: ``source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt`` Change the following items in the config file: * The inference resolution of Primary GIE is specified in the ``samples/models/Primary_detector/resnet10.prototxt``. In this file, change the `dim` (i.e. height and width of input tensor) from ``368x640`` to ``272x480``. * Change batch size under ``streammux`` and ``primary-gie`` to match the number of streams. * Disable tiled display and rendering using instructions above. * Enable `IoU` tracker. The application configuration for the DeepStream SDK is listed below: .. csv-table:: GA100 application configuration :file: ../text/tables/DS_performance_Ampere_application_configuration.csv :widths: 30, 40 :header-rows: 1 **Achieved Performance** The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration: ============= ======================== =================== ==================== Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization ============= ======================== =================== ==================== H.265 186 6% 58.53% H.264 98 19.68% 26.79% ============= ======================== =================== ==================== Data center GPU - T4 ~~~~~~~~~~~~~~~~~~~~~~~ This section describes configuration and settings for the DeepStream SDK on NVIDIA Data center GPU - T4. System Configuration ^^^^^^^^^^^^^^^^^^^^^^ The system configuration for the DeepStream SDK is listed below: .. csv-table:: T4 System configuration :file: ../text/tables/DS_performance_Tesla_system_configuration.csv :widths: 30, 40 :header-rows: 1 Application Configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Config file**: ``source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt`` Change the following items in the config file: * The inference resolution of Primary GIE is specified in the ``samples/models/Primary_detector/resnet10.prototxt``. In this file, change the `dim` (i.e. height and width of input tensor) from ``368x640`` to ``272x480``. * Change batch size under ``streammux`` and ``primary-gie`` to match the number of streams. * Disable tiled display and rendering using instructions above. * Enable `IoU` tracker. The application configuration for the DeepStream SDK is listed below: .. csv-table:: T4 application configuration :file: ../text/tables/DS_performance_Tesla_application_configuration.csv :widths: 30, 40 :header-rows: 1 **Achieved Performance** The table below shows the achieved performance of the DeepStream SDK under the specified system and application configuration: ============= ======================== =================== ==================== Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization ============= ======================== =================== ==================== H.265 78 15.96% 58% H.264 41 9.59% 31.65% ============= ======================== =================== ==================== Jetson ~~~~~~~ This section describes configuration and settings for the DeepStream SDK on NVIDIA Jetson™ platforms. JetPack 5.0 DP is used for software installation. System Configuration ^^^^^^^^^^^^^^^^^^^^^^^ For the performance test: 1. Max power mode is enabled: ``$ sudo nvpmodel -m 0`` 2. The GPU clocks are stepped to maximum: ``$ sudo jetson_clocks`` For information about supported power modes, see the “Supported Modes and Power Efficiency” section in the power management topics of `NVIDIA Tegra Linux Driver Package Development Guide`, e.g., “Power Management for Jetson AGX Xavier Devices.” Jetson AGX Xavier ^^^^^^^^^^^^^^^^^^^ **Config file**: ``source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt`` Change the following items in the config file: * The inference resolution of Primary GIE is specified in the ``samples/models/Primary_detector/resnet10.prototxt``. In this file, change the `dim` (i.e. height and width of input tensor) from ``368x640`` to ``272x480``. * Change batch size under ``streammux`` and ``primary-gie`` to match the number of streams. * Disable tiled display and rendering using instructions above. * Enable `IOU` tracker. The following tables describe performance results for the NVIDIA Jetson AGX Xavier™. .. csv-table:: Jetson Nano Pipeline Configuration (``deepstream-app``) :file: ../text/tables/DS_performance_Jetson_AGX_Xavier_app_configuration.csv :widths: 30, 40 :header-rows: 1 **Achieved Performance** ============= ======================== =================== ==================== Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization ============= ======================== =================== ==================== H.265 46 40.18% 93.25% H.264 34 32.24% 69.84% ============= ======================== =================== ==================== Jetson NX ^^^^^^^^^^^ **Config file**: ``source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt`` Change the following items in the config file: * The inference resolution of Primary GIE is specified in the ``samples/models/Primary_detector/resnet10.prototxt``. In this file, change the `dim` (i.e. height and width of input tensor) from ``368x640`` to ``272x480``. * Change batch size under ``streammux`` and ``primary-gie`` to match the number of streams. * Disable tiled display and rendering using instructions above. * Enable `IOU` tracker. The following tables describe performance results for the NVIDIA Jetson NX™. .. csv-table:: Jetson NX Pipeline Configuration (``deepstream-app``) :file: ../text/tables/DS_performance_Jetson_NX_app_configuration.csv :widths: 30, 40 :header-rows: 1 **Achieved Performance** ============= ======================== =================== ==================== Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization ============= ======================== =================== ==================== H.265 27 49.82% 92.48% H.264 22 45.06% 81.86% ============= ======================== =================== ==================== Jetson Orin ^^^^^^^^^^^ **Config file**: ``source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt`` Change the following items in the config file: * The inference resolution of Primary GIE is specified in the ``samples/models/Primary_detector/resnet10.prototxt``. In this file, change the `dim` (i.e. height and width of input tensor) from ``368x640`` to ``272x480``. * Change batch size under ``streammux`` and ``primary-gie`` to match the number of streams. * Disable tiled display and rendering using instructions above. * Enable `IOU` tracker. The following tables describe performance results for the NVIDIA Jetson Orin™. .. csv-table:: Jetson Orin Pipeline Configuration (``deepstream-app``) :file: ../text/tables/DS_performance_Jetson_Orin_app_configuration.csv :widths: 30, 40 :header-rows: 1 **Achieved Performance** ============= ======================== =================== ==================== Stream type No. of Stream @ 30 FPS CPU Utilization GPU Utilization ============= ======================== =================== ==================== H.265 37 9.29% 25.8% H.264 15 4.38% 11.92% ============= ======================== =================== ====================