DeepStream Reference Application - deepstream-app =================================================== Application Architecture --------------------------- The image below shows the architecture of the NVIDIA\ :sup:`®` DeepStream reference application. .. image:: /content/DS_reference_ds_app.png :align: center :alt: DeepStream Reference Application Architecture The DeepStream reference application is a GStreamer based solution and consists of set of GStreamer plugins encapsulating low-level APIs to form a complete graph. The reference application has capability to accept input from various sources like camera, RTSP input, encoded file input, and additionally supports multi stream/source capability. The list of GStreamer plugins implemented by NVIDIA and provided as a part of DeepStream SDK include: * The Stream Muxer plugin (Gst-nvstreammux) to form a batch of buffers from multiple input sources. * The NVIDIA TensorRT™ based plugin (Gst-nvinfer) for primary and secondary (attribute classification of primary objects) detection and classification respectively. * The OpenCV based tracker plugin (Gst-nvtracker) for object tracking with unique ID. * The Multi Stream Tiler plugin (Gst-nvmultistreamtiler) for forming 2D array of frames. * The Onscreen Display (OSD) plugin (Gst-nvdsosd) to draw shaded boxes, rectangles and text on the composited frame using the generated metadata. * The Message Converter (Gst-nvmsgconv) and Message Broker (Gst-nvmsgbroker) plugins in combination to send analytics data to a server in the Cloud. Reference Application Configuration ------------------------------------- The NVIDIA DeepStream SDK reference application uses one of the sample configuration files from the ``samples/configs/deepstream-app`` directory in the DeepStream package to: * Enable or disable components * Change the properties or behavior of components * Customize other application configuration settings that are unrelated to the pipeline and its components The configuration file uses a key file format, based on the freedesktop specifications at: https://specifications.freedesktop.org/desktop-entry-spec/latest Expected Output for the DeepStream Reference Application (deepstream-app) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The image below shows the expected output: .. image:: /content/DS_reference_ds_app_expected_output.png :align: center :alt: DeepStream Reference Application Architecture .. _config-groups-label: Configuration Groups --------------------- The application configuration is divided into groups of configurations for each component and application-specific component. The configuration groups are: .. csv-table:: Configuration Groups - deepstream app :file: ../text/tables/DS_ref_app_config_grp.csv :widths: 30, 40 :header-rows: 1 Application Group ~~~~~~~~~~~~~~~~~~~ The application group properties are: .. csv-table:: Application group :file: ../text/tables/DS_ref_app_application_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 Tiled-display Group ~~~~~~~~~~~~~~~~~~~~ The tiled-display group properties are: .. csv-table:: Tiled display group :file: ../text/tables/DS_ref_app_tiled_display_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 Source Group ~~~~~~~~~~~~~ The source group specifies the source properties. The DeepStream application supports multiple simultaneous sources. For each source, a separate group with the group names such as ``source%d`` must be added to the configuration file. For example: :: [source0] key1=value1 key2=value2 ... [source1] key1=value1 key2=value2 ... The source group properties are: .. csv-table:: Source group :file: ../text/tables/DS_ref_app_source_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 Streammux Group ~~~~~~~~~~~~~~~~ The [streammux] group specifies and modifies properties of the Gst-nvstreammux plugin. .. csv-table:: Streammux group :file: ../text/tables/DS_ref_app_streammux_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 Primary GIE and Secondary GIE Group ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The DeepStream application supports multiple secondary GIEs. For each secondary GIE, a separate group with the name ``secondary-gie%d`` must be added to the configuration file. For example: :: [primary-gie] key1=value1 key2=value2 ... [secondary-gie1] key1=value1 key2=value2 ... The primary and secondary GIE configurations are as follows. For each configuration, the Valid for column indicates whether the configuration property is valid for the primary or secondary TensorRT model, or for both models. .. csv-table:: Primary and Secondary GIE* group :file: ../text/tables/DS_ref_app_prim_sec_Gie_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 .. note:: \* The GIEs are the GPU Inference Engines. Tracker Group ~~~~~~~~~~~~~~ The tracker group properties are: .. csv-table:: Tracker group :file: ../text/tables/DS_ref_app_tracker_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 Message Converter Group ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Message converter group properties are: .. csv-table:: Message converter group :file: ../text/tables/DS_ref_app_msg_conv_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 Message Consumer Group ~~~~~~~~~~~~~~~~~~~~~~~ Message consumer group properties are: .. csv-table:: Message consumer group :file: ../text/tables/DS_ref_app_msg_consumer_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 OSD Group ~~~~~~~~~~~~ The OSD group specifies the properties and modifies the behavior of the OSD component, which overlays text and rectangles on the video frame. .. csv-table:: OSD group :file: ../text/tables/DS_ref_app_osd_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 Sink Group ~~~~~~~~~~~ The sink group specifies the properties and modifies the behavior of the sink components for rendering, encoding, and file saving. .. csv-table:: Sink group :file: ../text/tables/DS_ref_app_sink_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 Tests Group ~~~~~~~~~~~~~ The tests group is for diagnostics and debugging. .. csv-table:: Tests group :file: ../text/tables/DS_ref_app_tests_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 NvDs-analytics Group ~~~~~~~~~~~~~~~~~~~~ The ``[nvds-analytics]`` group is for adding nvds-analytics plugin in the pipeline. .. csv-table:: NvDs-analytics group :file: ../text/tables/DS_ref_app_NvDsAnalytics_grp.csv :widths: 20, 30, 10, 30, 10 :header-rows: 1 .. note:: See the DeepStream Plugin Guide for plugin-specific configuration file specifications (for the Gst-nvinfer, Gst-nvtracker, Gst-nvdewarper, Gst-nvmsgconv, Gst-nvmsgbroker and Gst-nvdsanalytics plugins). Application Tuning for DeepStream SDK ---------------------------------------- This section provides application tuning tips for the DeepStream SDK using the following parameters in the configuration file. Performance Optimization ~~~~~~~~~~~~~~~~~~~~~~~~~ This section covers various performance optimization steps that you can try for maximum performance. DeepStream best practices ^^^^^^^^^^^^^^^^^^^^^^^^^^ Here are few best practices to optimize DeepStream application to remove bottlenecks in your application: 1. Set the batch size of streamux and primary detector to equal the number of input sources. These settings are under ``[streammux]`` and ``[primary-gie]`` group of the config file. This keeps the pipeline running at full capacity. Higher or lower batch size than number of input sources can sometimes add latency in the pipeline. 2. Set the height and width of streammux to the input resolution. This is set under ``[streammux]`` group of the config file. This ensures that stream doesn’t go through any unwanted image scaling. 3. If you are streaming from live sources such as RTSP or from USB camera, set ``live-source=1`` in ``[streammux]`` group of config file. This enables proper timestamping for live sources creating smoother playback 4. Tiling and visual output can take up GPU resource. There are 3 things that you can disable to maximize throughput when you do not need to render the output on your screen. As an example, rendering is not required when you want to run inference on the edge and transmit just the metadata to the cloud for further processing. 1. Disable OSD or on-screen display. OSD plugin is used for drawing bounding boxes and other artifacts and adding labels in the output frame. To disable OSD set enable=0 in the ``[osd]`` group of the config file. 2. The tiler creates an ``NxM`` grid for displaying the output streams. To disable the tiled output, set enable=0 in the ``[tiled-display]`` group of the config file. 3. Disable the output sink for rendering: choose ``fakesink``, that is, ``type=1`` in the ``[sink]`` group of the config file. All the performance benchmark in Performance section are ran with tiling, OSD and output sink disabled. 5. If CPU/GPU utilization is low, then one of the possibilities is that the elements in the pipeline are getting starved for buffers. Then try increasing the number of buffers allocated by the decoder by setting the ``num-extra-surfaces`` property of the ``[source#]`` group in the application or the ``num-extra-surfaces`` property of ``Gst-nvv4l2decoder`` element. 6. If you are running the application inside docker console and it delivers low FPS, set ``qos=0`` in the configuration file’s [sink0] group. The issue is caused by initial load. When qos set to 1, as the property’s default value in the ``[sink0]`` group, decodebin starts dropping frames. 7. If you want to optimize processing pipelines end to end latency you can use latency measurement method in DeepStream. • To enable frame latency measurement, run this command on the console: :: $ export NVDS_ENABLE_LATENCY_MEASUREMENT=1 • To enable latency for all plugins, run this command on the console: :: $ export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 Jetson optimization ^^^^^^^^^^^^^^^^^^^^ 1. Ensure that Jetson clocks are set high. Run these commands to set Jetson clocks high. :: $ sudo nvpmodel -m --for MAX perf and power mode is 0 $ sudo jetson_clocks .. note:: For NX: use mode as 2. 2. On Jetson, use ``Gst-nvoverlaysink`` instead of ``Gst-nveglglessink`` as nveglglessink requires GPU utilization Triton ^^^^^^^ 1. If you are using Triton with DeepStream, tune ``tf_gpu_memory_fraction`` values for TensorFlow GPU memory usage per process - suggested range [0.2, 0.6]. Too large value can cause Out-of-memory and too small may cause low perf. 2. Enable TensorRT optimization when using TensorFlow or ONNX with Triton. Update Triton config file to enable TensorFlow/ONNX TensorRT online optimization. This will take several minutes during initialization each time. Alternatively, you can generate TF-TRT `graphdef/savedmodel` models offline. Inference Throughput ^^^^^^^^^^^^^^^^^^^^^^ Here are a few steps to help you increase channel density for your application: 1. If you are using Jetson Xavier or Xavier NX, you can use the DLA (Deep learning accelerator) for inferencing. This frees GPU for other models or more streams. 2. With DeepStream, users can infer every other frame or every third frame and use a tracker to predict the location in the object. This can be done with a simple config file change. Users can use one of the 3 available trackers to track the object in the frame. In the inference config file, change the interval parameter under ``[property]``. This is a skip interval, number of frames to skip between inference. Interval of 0 means infer every frames and interval of 1 means skip 1 frame and infer every other frame. This can effectively double your overall channel throughput by going from interval of 0 to 1. 3. Choose lower precision such as FP16 or INT8 for inference. If you want to use FP16, no new model is required. This is a simple change in the DS. To change, update the network-mode option in the inference config file. If you want to run INT8, an INT8 calibration cache is required which contains the FP32 to INT8 quantization table. 4. DeepStream app can also be configured to have cascaded neural network. First network does the detection followed by second network with does some classification on the detection. To enable secondary inference, enable the secondary-gie from the config file. Set the appropriate batch sizes. Batch size will depend on number of objects that are typically sent to the secondary inference from primary inference. User will have to experiment to see what the appropriate batch size for their use case is. To reduce the number of inferences of the secondary classifier, the objects to infer on can be filtered by setting ``input-object-min-width``, ``input-object-min-height``, ``input-object-max-width``, ``input-object-max-height``, ``operate-on-gie-id``, ``operate-on-class-ids`` appropriately. Reducing Spurious Detections ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. csv-table:: Reducing spurious detections :file: ../text/tables/DS_ref_app_reducing_spurious_detections.csv :widths: 30, 30,30 :header-rows: 1