DeepStream overview

DeepStream is a streaming analytic toolkit to build AI-powered applications. It takes in streaming data as input from USB/CSI camera, video from file or streams over RTSP and uses AI and computer vision to generate insights for better understanding of the environment, converting pixels to insights. DeepStream SDK can be used to build video analytic solutions for traffic and pedestrian understanding in smart city, health and safety monitoring in hospitals, self-checkout and analytics in retail, detecting component defects at a manufacturing facility and others
A picture containing grass, monitor, green, display Description automatically generated
DeepStream supports application development in C/C++ as well as in Python through the Python bindings. To make it easier to get started, DeepStream ships with several reference applications in both in C/C++ and in Python. See the C/C++ sample apps and Python sample apps section to learn more about the available apps. There are apps that are released on NVIDIA-AI-IOT GitHub page.
The core SDK consists of several hardware accelerator plugins that use various accelerators such as VIC, GPU, DLA, NVDEC and NVENC. By performing all the compute heavy operations in a dedicated accelerator, DeepStream can achieve highest performance for video analytic applications. One of the key capabilities of DeepStream is secure bi-directional communication between edge and cloud. DeepStream ships with several out of the box security protocols such as SASL/Plain authentication using username/password and 2-way TLS authentication. To learn more about these security features, read the “Security” section in the DeepStream Plugin Development Guide. To learn more about bi-directional capabilities, see the “Bi-directional messaging” section in the DeepStream Plugin Development Guide.
DeepStream builds on top of several NVIDIA libraries from the CUDA-X stack such as CUDA, TensorRT, Triton Inference server and multimedia libraries. TensorRT accelerates the AI inference on NVIDIA GPU. DeepStream abstracts away these libraries in DeepStream plugins, making it easy for developers to build video analytic pipelines without having to learn all the individual libraries.
DeepStream is optimized for NVIDIA GPUs; the application can be deployed on an embedded edge device running Jetson platform or can be deployed on larger edge or datacenter GPUs like T4. DeepStream applications can be deployed in containers using NVIDIA container Runtime. The containers are available on NGC, NVIDIA GPU cloud registry. To learn more about deployment with dockers, see the Docker container chapter. DeepStream applications can be orchestrated on the edge using Kubernetes on GPU. Sample Helm chart to deploy DeepStream application is available on NGC.

DeepStream graph architecture

DeepStream is an optimized graph architecture built using the open source GStreamer framework. DeepStream provides building blocks in the form of GStreamer plugins that can be used to construct an efficient video analytic pipeline. There are more than 15 plugins that are hardware accelerated for various tasks. For more details about different Plugins, see the “GStreamer Plugin Details” section in DeepStream Plugin Development Guide. The graph below shows a typical video analytic application starting from input video to outputting insights. All the individual blocks are various plugins that are used. At the bottom, are the different hardware engines that are utilized throughout the application. Optimum memory management with zero-memory copy between plugins and the use of various accelerators provides the highest performance.
A screenshot of a cell phone Description automatically generated
Streaming data can come over the network through RTSP or from a local file system or from a camera directly. The streams are captured using the CPU. Once the frames are in the memory, they are sent for decoding using the NVDEC accelerator. The plugin for decode is called Gst-nvvideo4linux2. After decoding, there is an optional image pre-processing step where the input image can be pre-processed before inference. The pre-processing can be image dewarping or color space conversion. Gst-nvdewarper plugin can dewarp the image from a fisheye or 360 degree camera. Gst-nvvideoconvert plugin can perform color format conversion on the frame. These plugins use GPU or VIC (vision image compositor).
The next step is to batch the frames for optimal inference performance. Batching is done using the Gst-nvstreammux plugin. Once frames are batched, it is sent for inference. The inference can be done using TensorRT, NVIDIA’s inference accelerator runtime or can be done in the native framework such as TensorFlow or PyTorch using Triton inference server. Native TensorRT inference is performed using Gst-nvinfer plugin and inference using Triton is done using Gst-nvinferserver plugin. The inference can use the GPU or DLA (Deep Learning accelerator) for Jetson AGX Xavier and Xavier NX. After inference, the next step could involve tracking the object. There are several built-in reference trackers in the SDK, ranging from high performance to high accuracy. Object tracking is performed using the Gst-nvtracker plugin.
For creating visualization artifacts such as bounding boxes, segmentation masks, labels there is a visualization plugin called Gst-nvdsosd. Finally to output the results, DeepStream presents various options: render the output with the bounding boxes on the screen, save the output to the local disk, stream out over RTSP or just send the metadata to the cloud. For sending metadata to the cloud, DeepStream uses Gst-nvmsgconv and Gst-nvmsgbroker plugin. Gst-nvmsgconv converts the metadata into schema payload and Gst-msgbroker establishes the connection to the cloud and sends the telemetry data. There are several built-in broker protocols such as Kafka, MQTT, AMQP and Azure IoT. Custom broker adapters can be created.

DeepStream reference app

To get started, developers can use the provided reference applications. Also included are the source code for these applications. The end-to-end application is called ‘deepstream-app’. This app is fully configurable - it allows users to configure any type and number of sources. Users can also select the type of networks to run inference. It comes pre-built with an inference plugin to do object detection cascaded by inference plugins to do image classification. There is an option to configure a tracker. For the output, users can select between rendering on screen, saving the output file, or streaming the video out over RTSP.
A screenshot of a social media post Description automatically generated
This is a good reference application to start learning the capabilities of DeepStream. This application is covered in greater detail in the DeepStream Reference Application - deepstream-app section. The source code for this application is available in /opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-app. This application will work for all AI models with detailed instructions provided in individual READMEs. The performance benchmark is also run using this application.

Getting started with building apps

For developers looking to build their custom application, the ‘deepstream-app’ can be a bit overwhelming to start development. The SDK ships with several simple applications, where developers can learn about basic concepts of DeepStream, constructing a simple pipeline and then progressing to build more complex applications.
A screenshot of a cell phone Description automatically generated
Developers can start with deepstream-test1 which is almost like a DeepStream hello world. In this app, developers will learn how to build a GStreamer pipeline using various DeepStream plugins. They will take video from a file, decode, batch and then do object detection and then finally render the boxes on the screen. The deepstream-test2 progresses from test1 and cascades secondary network to the primary network. The deepstream-test3 shows how to add multiple video sources and then finally test4 will show how to IoT services using the message broker plugin. These 4 starter applications are available in both native C/C++ as well as in Python. To read more about these apps and other sample apps in DeepStream, see the C/C++ sample apps and Python sample apps.

DeepStream in Python

Python is easy to use and widely adopted by data scientists and deep learning experts when creating AI models. NVIDIA introduced Python bindings to help you build high-performance AI applications using Python. DeepStream pipelines can be constructed using Gst-Python, the GStreamer framework’s Python bindings.
A screenshot of a cell phone Description automatically generated
The DeepStream Python application uses the Gst-Python API action to construct the pipeline and use probe functions to access data at various points in the pipeline. The data types are all in native C and require a shim layer through PyBindings or NumPy to access them from the Python app. Tensor data is the raw tensor output that comes out after inference. If you are trying to detect an object, this tensor data needs to be post-processed by a parsing and clustering algorithm to create bounding boxes around the detected object. To get started with Python, see the Sample Python apps in the development guide and “DeepStream Python” in the DeepStream Python API Guide.