DeepStream 3D Action Recognition App

The deepstream-3d-action-recognition sample application is provided at app/sample_apps/deepstream-3d-action-recognition for your reference. This example demonstrates a sequence batching based 3D or 2D model inference pipeline for action recognition. The image below shows the architecture of this reference app.

DeepStream 3D Action Recognition Application Architecture

Gst-nvdspreprocess plugin re-processses the input tensors for Gst-nvinfer plugin. Gst-nvdspreprocess loads a custom_sequence_preprocess lib (subfolder) to perform temporal sequence batching and ROI spatial batching. It delivers the preprocessed batched tensor buffers into downstream plugin Gst-nvinfer for inference. This application probes the tensor data and action classification result, converts them into display metadata to print on screen. This 3D/2D model is pretrained by NVIDIA TAO toolkit. The 3D model has NCDHW(NCSHW) input and the 2D model has NSHW shapes.

N: Max batch size of total number of ROIs in all streams, value > 0.
C: Channel numbers, must be 3.
D/S: sequence length of consecutive frames, value > 1
H: height, value > 0
W: width, value > 0
2D S: channels x sequence_length, reshaped from [C, D]

A custom sequence preprocessing lib: libnvds_custom_sequence_preprocess.so is also provided at sources/apps/sample_apps/deepstream-3d-action-recognition/custom_sequence_preprocess to demonstrate how to implement a sequence batching and preprocessing methods with Gst-nvdspreprocess plugin. This custom lib normalizes each incoming ROI cropped image and accumulates the data into buffer sequence for temporal batching. When temporal batching is ready, it continues to do spacial batching on multi-ROIs and multi-streams. Finally it returns the temporal and spacial batched buffer(tensor) to Gst-nvdspreprocess plugin which would attach the buffer as preprocess input metadata and deliver to downstream Gst-nvinfer plugin to do inference.

Getting Started

Prerequisites

  • Go to the folder sources/apps/sample_apps/deepstream-3d-action-recognition

  • Search and Download 3D and 2D RGB based tao_iva_action_recognition_pretrained models from NGC https://ngc.nvidia.com/catalog/models/nvidia:tao:actionrecognitionnet (Version 5):

    • resnet18_3d_rgb_hmdb5_32

    • resnet18_2d_rgb_hmdb5_32

  • These Models support following classes : push; fall_floor; walk; run; ride_bike

  • Update source streams uri-list in action recognition config file: deepstream_action_recognition_config.txt.

    uri-list=file:///path/to/sample_action1.mov;file:///path/to/sample_action2.mov;file:///path/to/sample_action3.mov;file:///path/to/sample_action4.mov;
    
  • Export DISPLAY environment to correct display. e.g. export DISPLAY=:0.0

Run 3D Action Recognition Examples

  • Make sure 3D preprocess config and 3D inference config are enabled in deepstream_action_recognition_config.txt.

    # Enable 3D preprocess and inference
    preprocess-config=config_preprocess_3d_custom.txt
    infer-config=config_infer_primary_3d_action.txt
    
  • Run the following command:

    $ deepstream-3d-action-recognition -c deepstream_action_recognition_config.txt
    

Run 2D Action Recognition Examples

  • Make sure 2D preprocess config and 2D inference config are enabled in deepstream_action_recognition_config.txt.

    # Enable 2D preprocess and inference
    preprocess-config=config_preprocess_2d_custom.txt
    infer-config=config_infer_primary_2d_action.txt
    
  • Run the following command:

    $ deepstream-3d-action-recognition -c deepstream_action_recognition_config.txt
    

DeepStream 3D Action Recognition App Configuration Specifications

deepstream-3d-action-recognition [action-recognition] group settings

The table below demonstrates the group settings for deepstream_action_recognition_config.txt as an example.

3D action recognition Supported Settings

Property

Meaning

Type and Range

Example

uri-list

source video file or stream list

Semicolon delimited string list

file:///path/to/sample_action1.mp4;file:///path/to/sample_action2.mp4;

display-sync

Indicate display synchronization on timestamp or not

Boolean

display-sync=1

preprocess-config

Gst-nvdspreprocess plugin config file path

String

preprocess-config=config_preprocess_3d_custom.txt

infer-config

Gst-nvinfer plugin config file path

String

infer-config=config_infer_primary_2d_action.txt

muxer-height

Gst-nvstreammux height

Unsigned Integer

muxer-height=720

muxer-width

Gst-nvstreammux width

Unsigned Integer

muxer-width=1280

muxer-batch-timeout

Gst-nvstreammux batched push timeout in usec

Unsigned Integer

muxer-batch-timeout=40000

tiler-height

Gst-nvmultistreamtiler height

Unsigned Integer

tiler-height=720

tiler-width

Gst-nvmultistreamtiler width

Unsigned Integer

tiler-width=1280

debug

Log print debug level

Integer, 0: disabled. 1: debug. 2: verbose

debug=0

enable-fps

Indicate whether print fps on screen

Boolean

enable-fps=1

Custom sequence preprocess lib user settings [user-configs] for gst-nvdspreprocess

The table below demonstrates the config_preprocess_3d_custom.txt setting of libnvds_custom_sequence_preprocess.so as an example.

user-configs properties for custom sequence preprocess

Property

Meaning

Type and Range

Example

channel-scale-factors

scale factor list for each channel

Semicolon delimited float array

channel-scale-factors= 0.007843137;0.007843137;0.007843137

channel-mean-offsets

data mean offsets for each channel

Semicolon delimited float array

channel-mean-offsets=127.5;127.5;127.5

stride

sequence sliding stride for each batched sequnece

Unsigned Integer, value >= 1

stride=1

subsample

Subsample rates for inference images in each sequence

Unsigned Integer, value >= 0

subsample=0

Custom lib and `gst-nvdspreprocess` Settings for Action Recognition

  • You’ll need to set input order as CUSTOM network-input-order=2 for this custom sequence preprocess lib.

  • 3D models NCDHW(NCSHW) require network-input-shape with 5-dimension shape. For example:

       network-input-shape= 4;3;32;224;224
    
    It means max_batch_size: 4, channels 3, sequence_len: 32, height 224, width 224.
    
  • 2D models NSHW require network-input-shape with 4-dimension shape. For example:

       network-input-shape= 4;96;224;224
    
    It means max_batch_size: 4, channels 3, sequence_len: 32, height 224, width 224. where 96 = channels x sequence_len.
    
  • Assume incoming frame numbers are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15… When subsample=1, the preprocessing custom lib will pick up frame numbers: 1,3,5,7,9… to preprocess sequentially and pass it onto inference as a next-step.

  • Assuming same incoming frame numbers above, For example, when subsample=0, stride=1, the 2 consecutive sliding sequences are:

  • Batch A: [1,2,3,4,5...]
    Batch B: [2,3,4,5,6...]
    

    When subsample=0, stride=2, the 2 consecutive sliding sequences are:

    Batch A: [1,2,3,4,5...]
    Batch B: [3,4,5,6,7...]
    

    When subsample=1, stride=2, the subsample is performed first, and sliding sequences are on top of subsample results. The processing frame numbers after subsample are: 1,3,5,7,9,11,13,15,17,19… The consecutive sliding sequences on top of them are:

    Batch A: [1,3,5,7,9...]
    Batch B: [5,7,9,11,13...] # 1st frame sliding from frame 1 of Batch A to frame 5
    Batch C: [9,11,13,15,17...] # 1st frame sliding from frame 5 of Batch C to frame 9
    

The image below shows the frame batches with different subsample and stride settings.

3D Action Sequence batching

Build Custom sequence preprocess lib and application From Source

  • Go to folder sources/apps/sample_apps/deepstream-3d-action-recognition.

  • Run the following commands:

    $ make
    $ make install
    
  • Check source code and comments to learn about implementation of other order formats e.g. NSCHW(NDCHW).