DeepStream 3D Action Recognition App
========================================

The ``deepstream-3d-action-recognition`` sample application is provided at `app/sample_apps/deepstream-3d-action-recognition` for your reference. This example demonstrates a sequence batching based 3D or 2D model inference pipeline for action recognition. The image below shows the architecture of this reference app.

  .. image:: /content/DS_Action_Recognition_Pipeline.png
        :align: center
        :alt: DeepStream 3D Action Recognition Application Architecture

``Gst-nvdspreprocess`` plugin re-processses the input tensors for `Gst-nvinfer` plugin. ``Gst-nvdspreprocess``  loads a `custom_sequence_preprocess lib` (subfolder) to perform temporal sequence batching and ROI spatial batching. It delivers the preprocessed batched tensor buffers into downstream plugin `Gst-nvinfer` for inference. This application probes the tensor data and action classification result, converts them into display metadata to print on screen.
This 3D/2D model is pretrained by NVIDIA TAO toolkit. The 3D model has NCDHW(NCSHW) input and the 2D model has NSHW shapes.
::

    N: Max batch size of total number of ROIs in all streams, value > 0.
    C: Channel numbers, must be 3.
    D/S: sequence length of consecutive frames, value > 1
    H: height, value > 0
    W: width, value > 0
    2D S: channels x sequence_length, reshaped from [C, D]

A custom sequence preprocessing lib: `libnvds_custom_sequence_preprocess.so` is also provided at `sources/apps/sample_apps/deepstream-3d-action-recognition/custom_sequence_preprocess` to demonstrate how to implement a sequence batching and preprocessing methods with ``Gst-nvdspreprocess`` plugin. This custom lib normalizes each incoming ROI cropped image and accumulates the data into buffer sequence for temporal batching. When temporal batching is ready, it continues to do spacial batching on multi-ROIs and multi-streams. Finally it returns the temporal and spacial batched buffer(tensor) to ``Gst-nvdspreprocess`` plugin which would attach the buffer as preprocess input metadata and deliver to downstream `Gst-nvinfer` plugin to do inference.

Getting Started
#####################

Prerequisites
----------------
* Go to the folder `sources/apps/sample_apps/deepstream-3d-action-recognition`
* Search and Download 3D and 2D RGB based ``tao_iva_action_recognition_pretrained`` models from NGC https://ngc.nvidia.com/catalog/models/nvidia:tao:actionrecognitionnet (Version 5):

   * `resnet18_3d_rgb_hmdb5_32`
   * `resnet18_2d_rgb_hmdb5_32`

* These Models support following classes : ``push``; ``fall_floor``; ``walk``; ``run``; ``ride_bike``

* Update source streams `uri-list` in action recognition config file: `deepstream_action_recognition_config.txt`.
  ::

    uri-list=file:///path/to/sample_action1.mov;file:///path/to/sample_action2.mov;file:///path/to/sample_action3.mov;file:///path/to/sample_action4.mov;

* Export DISPLAY environment to correct display. e.g. `export DISPLAY=:0.0`

Run 3D Action Recognition Examples
--------------------------------------
* Make sure 3D preprocess config and 3D inference config are enabled in ``deepstream_action_recognition_config.txt``.
  ::

    # Enable 3D preprocess and inference
    preprocess-config=config_preprocess_3d_custom.txt
    infer-config=config_infer_primary_3d_action.txt

* Run the following command:
  ::

    $ deepstream-3d-action-recognition -c deepstream_action_recognition_config.txt

Run 2D Action Recognition Examples
--------------------------------------
* Make sure 2D preprocess config and 2D inference config are enabled in ``deepstream_action_recognition_config.txt``.
  ::

    # Enable 2D preprocess and inference
    preprocess-config=config_preprocess_2d_custom.txt
    infer-config=config_infer_primary_2d_action.txt

* Run the following command:
  ::

    $ deepstream-3d-action-recognition -c deepstream_action_recognition_config.txt

DeepStream 3D Action Recognition App Configuration Specifications
####################################################################

deepstream-3d-action-recognition ``[action-recognition]`` group settings
----------------------------------------------------------------------------

The table below demonstrates the group settings for `deepstream_action_recognition_config.txt` as an example.

.. csv-table:: 3D action recognition Supported Settings
     :file: ../text/tables/DS_3D_Action tables/DS_3D_Action_config_settings.csv
     :widths: 20, 20, 20, 40
     :header-rows: 1

Custom sequence preprocess lib user settings ``[user-configs]`` for ``gst-nvdspreprocess``
-------------------------------------------------------------------------------------------

The table below demonstrates the `config_preprocess_3d_custom.txt` setting of `libnvds_custom_sequence_preprocess.so` as an example.

.. csv-table:: user-configs properties for custom sequence preprocess
      :file: ../text/tables/Gst-nvdspreprocess tables/DS_Plugin_gst-nvdspreprocess_custom_sequence.csv
      :widths: 20, 20, 20, 40
      :header-rows: 1

Custom `lib` and ```gst-nvdspreprocess``` Settings for Action Recognition
---------------------------------------------------------------------------

* You'll need to set input order as CUSTOM `network-input-order=2` for this custom sequence preprocess lib.
* 3D models NCDHW(NCSHW) require network-input-shape with 5-dimension shape. For example: ::

       network-input-shape= 4;3;32;224;224

    It means max_batch_size: 4, channels 3, sequence_len: 32, height 224, width 224.

* 2D models NSHW require network-input-shape with 4-dimension shape. For example: ::

       network-input-shape= 4;96;224;224

    It means max_batch_size: 4, channels 3, sequence_len: 32, height 224, width 224. where 96 = channels x sequence_len.

* Assume incoming frame numbers are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15... When `subsample=1`, the preprocessing custom lib will pick up frame numbers: 1,3,5,7,9... to preprocess sequentially and  pass it onto inference as a next-step.
* Assuming same incoming frame numbers above, For example, when `subsample=0, stride=1`, the 2 consecutive sliding sequences are:
*
    ::

       Batch A: [1,2,3,4,5...]
       Batch B: [2,3,4,5,6...]

    When `subsample=0, stride=2`, the 2 consecutive sliding sequences are:

    ::

       Batch A: [1,2,3,4,5...]
       Batch B: [3,4,5,6,7...]

    When `subsample=1, stride=2`, the subsample is performed first, and sliding sequences are on top of subsample results. The processing frame numbers after subsample are: 1,3,5,7,9,11,13,15,17,19... The consecutive sliding sequences on top of them are:

    ::

       Batch A: [1,3,5,7,9...]
       Batch B: [5,7,9,11,13...] # 1st frame sliding from frame 1 of Batch A to frame 5
       Batch C: [9,11,13,15,17...] # 1st frame sliding from frame 5 of Batch C to frame 9

The image below shows the frame batches with different subsample and stride settings.

  .. image:: /content/DS_Action_Recognition_Sequence_Batching.png
        :align: center
        :alt: 3D Action Sequence batching

Build Custom sequence preprocess lib and application From Source
########################################################################

* Go to folder `sources/apps/sample_apps/deepstream-3d-action-recognition`.
* Run the following commands: ::

      $ make
      $ make install

* Check source code and comments to learn about implementation of other order formats e.g. NSCHW(NDCHW).