Gst-nvdsucx (Alpha)
===================

Gst-NvDsUcx is a Gstreamer plugin that provides a set of elements that can be used to send and receive pipeline data using RDMA. This allows for distributing the Gstreamer pipeline to various hosts in order to use distributed GPU resources. It is built on top of the Unified Communication X (UCX) library to send/receive Gstreamer packets over an RDMA-enabled network. UCX is an open-source library that accelerates data over high-performance networks and can utilize GPUDirect RDMA technology for minimal network latencies and highest throughput of distributed GPU traffic. For more details on UCX, see https://openucx.org.

Description
-----------
Gst-NvDsUcx provides separate sink (to receive data from the pipeline) and source elements (to forward data to the pipeline), which connect to each other over the RDMA network. Furthermore, each sink or source type element can be a server or client, where the server element must be started before the client. As a result, the Gst-NvDsUcx plugin provides 4 elements: **nvdsucxserversink, nvdsucxclientsink, nvdsucxserversrc, nvdsucxclientsrc**.

Since the Gst-NvDsUcx plugin needs to present itself as a sink and source to the Deepstream pipeline, you need to pair the elements based on which part of the pipeline needs to be started first:

* ``nvdsucxserversink`` <-> ``nvdsucxclientsrc`` (Sink side starts first)

* ``nvdsucxclientsink`` <-> ``nvdsucxserversrc`` (Source side starts first)

|
|
.. image:: /content/DS_plugin_gst-nvdsucx.png
         :align: center
         :alt: Gst-nvdsucx


Requirements
------------

The Gst-NvDsUcx plugin has the following requirements (in addition to the Deepstream 6.1 SDK requirements):

* Nvidia ConnectX6-DX NIC or later.

   * For more information on installing and configuring NICs,vsee: https://docs.nvidia.com/networking/display/ConnectX6VPI/Introduction

* Mellanox Open Fabrics Enterprise Distribution (MLNX_OFED) - version 5.5 or later, see https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/

   * For installation instructions, see https://docs.nvidia.com/networking/display/MLNXOFEDv551032/Installing+MLNX_OFED
   * If installing the Mellanox OFED within a container:

     * Make sure to install the kernel drivers in the host OS by passing the ``--all`` flag to mlnxofedinstall script.
     * In the container you can only install the userspace libraries using the ``--user-space-only`` flag to the mlnxofedinstall script.
 
* UCX - version 1.12 or later - needs to be compiled with CUDA support or use CUDA-enabled UCX packages from the git repository directly, see https://github.com/openucx/ucx/releases

   * For installation instructions, follow the Release build instructions from here: https://github.com/openucx/ucx#release-builds. Note that UCX library should be compiled with CUDA as follows::

     $ ./contrib/configure-release --prefix=/install/path --with-cuda=/path/to/cuda

* For additional metadata processing, Gst-NvDsUcx depends on the serialization library provided by the Gst-NvDsMetaUtils plugin.Refer to the Gst-NvDsMetaUtils documentation for configuring and installing the serialization library.

|

.. note::
  This plugin is only supported on x86_64 platforms.

Inputs and Outputs
------------------
* Inputs (for Nvdsucxserversink or Nvdsucxclientsink)

  Any one of the following:

    * NV12/RGBA NVMM Gst Buffer + (NvDsBatchMeta + Serialized NvDsUserMeta/Gst Meta - optional)
    * NVMM or Raw Audio Buffers + (NvDsBatchMeta - optional)
    * Raw Text Gst Buffers

* Control parameters

  * addr
  * port
  * buf-type
  * gpu-id
  * raw-buf-size
  * nvbuf-memory-type
  * num-nvbuf
  * nvbuf-batch-size
  * num-conns

* Output (from Nvdsucxserversrc or Nvdsucxclientsrc)

  Any one of the following:

    * NV12/RGBA NVMM Gst Buffer + (NvDsBatchMeta + Serialized Video NvDsUserMeta/Gst Meta - optional)
    * NVMM or Raw Audio Buffers + (NvDsBatchMeta + Serialized Audio NvDsUserMeta/Gst Meta - optional)
    * Raw Text Gst Buffers

Gst Properties
--------------
The Gst-nvdsucx plugin has the following properties based on which type of element is used:

  .. csv-table:: Gst-nvdsucx gst properties
   :file: ../text/tables/Gst-nvdsucx tables/DS_Plugin_gst-nvdsucx_gst-properties.csv
   :widths: 30, 30, 30, 30, 30
   :header-rows: 1

.. rubric:: Footnote

.. [#f1] These connections are established synchronously. The serversink plugin will always wait till all clients connect before starting the pipeline. Only, the serversink plugin supports more than one clientsrc connecting. The serversrc plugin will support only 1 connection from a clientsink.

Examples
--------

The DeepStream SDK 6.1 includes three examples on how to use the Gst-NvDsUcx plugin to disaggregate/divide the Gstreamer pipeline to run on separate processes/servers. Note that each example has a server and client program to run different parts of the pipeline separately. Always start the server program before the client program. 

Example 1:

The example here shows how to send/receive video data in the Gstreamer pipeline using serversink and clientsrc elements of the Gst-NvDsUcx plugin. The pipeline uses the uridecodebin and the nvvideoconverter plugins to pass the video frames to the serversink element based on the caps filter. The serversink forwards this video data to the clientsrc element (on another node/process using RDMA), which then forwards the data to the video converter. Finally, the data is stored in a file after encoding.

On DS Node 1::

   gst-launch-1.0 uridecodebin uri="file:///sample_1080p.mp4" async-handling=1 name=src1 src1. ! \
   queue ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=NV12,width=1920,height=1080' ! \
   nvdsucxserversink addr=192.168.100.1 port=4000 buf-type=nvdsucx-buf-video

On DS Node 2::

   gst-launch-1.0 nvdsucxclientsrc addr=192.168.100.1 port=4000 nvbuf-memory-type=2 num-nvbuf=4 buf-type=nvdsucx-buf-video ! \
   'video/x-raw(memory:NVMM),format=NV12,width=1920,height=1080,framerate=30/1' ! \
   queue ! nvvideoconvert ! nvv4l2h264enc ! h264parse ! qtmux name=mux_0 ! \
   filesink sync=1 async=0 qos=0 location=~/out_1080p.mp4

Example 2:

This example shows how to distribute the DS pipeline using Gst-NvDsUcx plugin and use serialization / de-serialization components to send serialized data over RDMA network. The Deepstream pipeline here consists of the streammux plugin that takes input from the filesrc after decoding. The streammux passes the frames to the nvinfer plugin that identifies certain objects in the frames and add that metadata to the frame. The serialization plugin (part of the Gst-NvDsMetaUtils library) creates a binary object corresponding to the metadata and adds it to the frame. The clientsink and serversrc elements are used here to demonstrate the flexiblity of setup of the Gst-NvDsUcx here. The clientsink will send the additional metadata along with the video frame via RDMA to the serversrc.

The serversrc then forwards the data to the deserialization plugin which extracts it to append the metadata correctly to the frame. The nvdsosd plugin interprets the metadata (bounding boxes) and then the file is stored after encoding.

On DS Node 1::

   gst-launch-1.0 filesrc location=~/sample_1080p.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m batch-size=1 ! \
   nvvideoconvert ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream-6.1/samples/configs/deepstream-app/config_infer_primary.txt ! \
   nvdsmetainsert serialize-lib = "/opt/nvidia/deepstream/deepstream-6.1/lib/libnvds_video_metadata_serialization.so" ! \
   nvdsucxclientsink addr=192.168.100.1 port=4000 buf-type=nvdsucx-buf-video

On DS Node 2::

   gst-launch-1.0 nvdsucxserversrc addr=192.168.100.1 port=4000 nvbuf-memory-type=2 num-nvbuf=8 buf-type=nvdsucx-buf-video nvbuf-batch-size=1 ! \
   'video/x-raw(memory:NVMM),format=NV12,width=1920,height=1080,framerate=30/1' ! nvvideoconvert ! \
   nvdsmetaextract deserialize-lib = "/opt/nvidia/deepstream/deepstream-6.1/lib/libnvds_video_metadata_serialization.so" ! \
   nvdsosd ! nvvideoconvert ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=~/out_1080p.mp4

Example 3:

This example demonstrates how audio data in a DS pipeline can be distributed across processes or nodes using the Gst-NvDsUcx and the Audio metadata serialization (part of Gst-NvDsMetaUtils) plugins. The streammux plugin interprets the audio data from the audio plugins and forwards it to the Gst-NvDsUcx plugin. Similar to the video metadata serialization plugin in Example 2, the audio metadata serialization plugin creates a binary object which the serversink element forwards to the clientsrc element. The audio metadata is extracted and added to the buffer for downstream plugins to interpret.

The streammux and streamdemux plugins only support audio in the new versions, so an environment variable must be set before the example is run.

On DS Node 1::

   USE_NEW_NVSTREAMMUX=yes gst-launch-1.0 uridecodebin uri="file:///sample_1080p_h264.mp4" ! audioconvert ! \
   audioresample ! 'audio/x-raw,format=F32LE,rate=48000,channels=1,layout=interleaved' ! audiobuffersplit ! \
   a_streammux.sink_0 nvstreammux name=a_streammux batch-size=1 sync-inputs=1 max-latency=250000000 ! \
   nvdsmetainsert serialize-lib="libnvds_audio_metadata_serialization.so" ! \
   nvdsucxserversink addr=192.168.100.2 port=4000 sync=1 async=0 buf-type=nvdsucx-buf-nv-audio

On DS Node 2::

   USE_NEW_NVSTREAMMUX=yes gst-launch-1.0 nvdsucxclientsrc addr=192.168.100.2 port=4000 nvbuf-memory-type=2 num-nvbuf=4 buf-type=nvdsucx-buf-nv-audio ! \
   'audio/x-raw(memory:NVMM),format=F32LE,rate=48000,channels=1,layout=interleaved' ! \
   nvdsmetaextract deserialize-lib = "libnvds_audio_metadata_serialization.so" ! nvstreamdemux name=asd asd.src_0 ! \
   audioconvert ! "audio/x-raw,format=S16LE" ! wavenc ! filesink sync=0 async=1 qos=0 location=out.wav