Gst-nvinfer#
The Gst-nvinfer plugin does inferencing on input data using NVIDIA® TensorRT™.
The plugin accepts batched NV12/RGBA buffers from upstream. The NvDsBatchMeta structure must already be attached to the Gst Buffers. The low-level library (libnvds_infer) operates on any of INT8 RGB, BGR, or GRAY data with dimension of Network Height and Network Width. The Gst-nvinfer plugin performs transforms (format conversion and scaling), on the input frame based on network requirements, and passes the transformed data to the low-level library. The low-level library preprocesses the transformed frames (performs normalization and mean subtraction) and produces final float RGB/BGR/GRAY planar data which is passed to the TensorRT engine for inferencing. The output type generated by the low-level library depends on the network type. The pre-processing function is:
y = net scale factor*(x-mean)
Where:
x is the input pixel value. It is an int8 with range [0,255].
mean is the corresponding mean value, read either from the mean file or as offsets[c], where c is the channel to which the input pixel belongs, and offsets is the array specified in the configuration file. It is a float.
net-scale-factor is the pixel scaling factor specified in the configuration file. It is a float.
y is the corresponding output pixel value. It is a float.
Gst-nvinfer currently works on the following type of networks:
Multi-class object detection
Multi-label classification
Segmentation (semantic)
Instance Segmentation
The Gst-nvinfer plugin can work in three modes:
Primary mode: Operates on full frames
Secondary mode: Operates on objects added in the meta by upstream components
Preprocessed Tensor Input mode: Operates on tensors attached by upstream components
When operating in preprocessed tensor input mode, the pre-processing inside Gst-nvinfer is completely
skipped. The plugin looks for GstNvDsPreProcessBatchMeta
attached to the input
buffer and passes the tensor as is to TensorRT inference function without any
modifications. This mode currently supports processing on full-frame and ROI. The
GstNvDsPreProcessBatchMeta is attached by the Gst-nvdspreprocess plugin.
When the plugin is operating as a secondary classifier along with the tracker, it tries to improve performance by avoiding re-inferencing on the same objects in every frame. It does this by caching the classification output in a map with the object’s unique ID as the key. The object is inferred upon only when it is first seen in a frame (based on its object ID) or when the size (bounding box area) of the object increases by 20% or more. This optimization is possible only when the tracker is added as an upstream element. Detailed documentation of the TensorRT interface is available at: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html The plugin supports the IPlugin interface for custom layers. Refer to section IPlugin Interface for details. The plugin also supports the interface for custom functions for parsing outputs of object detectors and initialization of non-image input layers in cases where there is more than one input layer. Refer to sources/includes/nvdsinfer_custom_impl.h for the custom method implementations for custom models.
Downstream components receive a Gst Buffer with unmodified contents plus the metadata created from the inference output of the Gst-nvinfer plugin. The plugin can be used for cascaded inferencing. That is, it can perform primary inferencing directly on input data, then perform secondary inferencing on the results of primary inferencing, and so on. See the sample application deepstream-test2 for more details.
Inputs and Outputs#
This section summarizes the inputs, outputs, and communication facilities of the Gst-nvinfer plugin.
Inputs
Gst Buffer
NvDsBatchMeta (attaching NvDsFrameMeta)
ONNX
TAO Encoded Model and Key
Offline: Supports engine files generated by TAO Toolkit SDK Model converters
Layers: Supports all layers supported by TensorRT, see: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html.
Control parameters
Gst-nvinfer gets control parameters from a configuration file. You can specify this by setting the property config-file-path. For details, see Gst-nvinfer File Configuration Specifications. Other control parameters that can be set through GObject properties are:
Batch size
Inference interval
Attach inference tensor outputs as buffer metadata
Attach instance mask output as in object metadata
The parameters set through the GObject properties override the parameters in the Gst-nvinfer configuration file.
Outputs
Gst Buffer
Depending on network type and configured parameters, one or more of:
NvDsObjectMeta
NvDsClassifierMeta
NvDsInferSegmentationMeta
NvDsInferTensorMeta
Features#
The following table summarizes the features of the plugin.
# Feature
Description
Release
Explicit Full Dimension Network Support
Refer to https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#work_dynamic_shapes for more details.
DS 5.0
Non-maximum Suppression (NMS)
New bounding box clustering algorithm.
DS 5.0
On-the-fly model update (Engine file only)
Update the model-engine-file on-the-fly in a running pipeline.
DS 5.0
Configurable frame scaling params
Configurable options to select the compute hardware and the filter to use while scaling frame/object crops to network resolution
DS 5.0
TAO toolkit encoded model support
—
DS 4.0
Gray input model support
Support for models with single channel gray input
DS 4.0
Tensor output as meta
Raw tensor output is attached as meta data to Gst Buffers and flowed through the pipeline
DS 4.0
Segmentation model
Supports segmentation model
DS 4.0
Maintain input aspect ratio
Configurable support for maintaining aspect ratio when scaling input frame to network resolution
DS 4.0
Custom cuda engine creation interface
Interface for generating CUDA engines from TensorRT INetworkDefinition and IBuilder APIs instead of model files
DS 4.0
ONNX Model support
—
DS 3.0
Multiple modes of operation
Support for cascaded inferencing
DS 2.0
Asynchronous mode of operation for secondary inferencing
Infer asynchronously for secondary classifiers
DS 2.0
Grouping using CV::Group rectangles
For detector bounding box clustering
DS 2.0
Configurable batch-size processing
User can configure batch size for processing
DS 2.0
No Restriction on number of output blobs
Supports any number of output blobs
DS 3.0
Configurable number of detected classes (detectors)
Supports configurable number of detected classes
DS 3.0
Support for Classes: configurable (> 32)
Supports any number of classes
DS 3.0
Application access to raw inference output
Application can access inference output buffers for user specified layer
DS 3.0
Support for single shot detector (SSD)
—
DS 3.0
Secondary GPU Inference Engines (GIEs) operate as detector on primary bounding box
Supports secondary inferencing as detector
DS 2.0
Multiclass secondary support
Supports multiple classifier network outputs
DS 2.0
Grouping using DBSCAN
For detector bounding box clustering
DS 3.0
Loading an external lib containing IPlugin implementation for custom layers (IPluginCreator & IPluginFactory)
Supports loading (dlopen()) a library containing IPlugin implementation for custom layers
DS 3.0
Multi GPU
Select GPU on which we want to run inference
DS 2.0
Detection width height configuration
Filter out detected objects based on min/max object size threshold
DS 2.0
Allow user to register custom parser
Supports final output layer bounding box parsing for custom detector network
DS 2.0
Bounding box filtering based on configurable object size
Supports inferencing in secondary mode objects meeting min/max size threshold
DS 2.0
Configurable operation interval
Interval for inferencing (number of batched buffers skipped)
DS 2.0
Select Top and bottom regions of interest (RoIs)
Removes detected objects in top and bottom areas
DS 2.0
Operate on Specific object type (Secondary mode)
Process only objects of define classes for secondary inferencing
DS 2.0
Configurable blob names for parsing bounding box (detector)
Support configurable names for output blobs for detectors
DS 2.0
Allow configuration file input
Support configuration file as input (mandatory in DS 3.0)
DS 2.0
Allow selection of class id for operation
Supports secondary inferencing based on class ID
DS 2.0
Support for Full Frame Inference: Primary as a classifier
Can work as classifier as well in primary mode
DS 2.0
Multiclass secondary support
Support multiple classifier network outputs
DS 2.0
Secondary GIEs operate as detector on primary bounding box Support secondary inferencing as detector
—
DS 2.0
Supports FP16, FP32 and INT8 models FP16 and INT8 are platform dependent
—
DS 2.0
Supports TensorRT Engine file as input
—
DS 2.0
Inference input layer initialization Initializing non-video input layers in case of more than one input layers
—
DS 3.0
Support for FasterRCNN
—
DS 3.0
Support for Yolo detector (YoloV3/V3-tiny/V2/V2-tiny)
—
DS 4.0
Support for yolov3-spp detector
—
DS 5.0
Support Instance segmentation with MaskRCNN
Support for instance segmentation using MaskRCNN. It includes output parser and attach mask in object metadata.
DS 5.0
Support for NHWC network input
—
DS 6.0
Added support for TAO ONNX model
—
DS 6.0
Support for input tensor meta
Inferences using already preprocessed raw tensor from input tensor meta (attached as user meta at batch level) and skips preprocessing in nvinfer. In this mode, the batch-size of nvinfer must be equal to the sum of ROIs set in the gst-nvdspreprocess plugin config file.
DS 6.0
Support for clipping bounding boxes to ROI boundary
—
DS 6.2
Gst-nvinfer File Configuration Specifications#
The Gst-nvinfer configuration file uses a “Key File” format described in https://specifications.freedesktop.org/desktop-entry-spec/latest.
The [property]
group configures the general behavior of the plugin. It is the only mandatory group.
The [class-attrs-all]
group configures detection parameters for all classes.
The [class-attrs-<class-id>]
group configures detection parameters for a class specified by <class-id>. For example, the [class-attrs-23]
group configures detection parameters for class ID 23. This type of group has the same keys as [class-attrs-all]
.
The following two tables respectively describe the keys supported for [property]
groups and [class-attrs-…]
groups.
# Name
Description
Type and Range
Example
Notes
(Primary/Secondary)
threshold
Detection threshold
Float, ≥0
threshold=0.5
Object detector
Both
pre-cluster-threshold
Detection threshold to be applied prior to clustering operation
Float, ≥0
pre-cluster-threshold=
0.5
Object detector
Both
post-cluster-threshold
Detection threshold to be applied post clustering operation
Float, ≥0
post-cluster-threshold=
0.5
Object detector
Both
eps
Epsilon values for OpenCV grouprectangles() function and DBSCAN algorithm
Float, ≥0
eps=0.2
Object detector
Both
group-threshold
Threshold value for rectangle merging for OpenCV grouprectangles() function
Integer, ≥0
group-threshold=1
0 disables the clustering functionality
Object detector
Both
minBoxes
Minimum number of points required to form a dense region for DBSCAN algorithm
Integer, ≥0
minBoxes=1
0 disables the clustering functionality
Object detector
Both
dbscan-min-score
Minimum sum of confidence of all the neighbors in a cluster for it to be considered a valid cluster.
Float, ≥0
dbscan-min-score=
0.7
Object detector
Both
nms-iou-threshold
Maximum IOU score between two proposals after which the proposal with the lower confidence will be rejected.
Float, ≥0
nms-iou-threshold=
0.2
Object detector
Both
roi-top-offset
Offset of the RoI from the top of the frame. Only objects within the RoI are output.
Integer, ≥0
roi-top-offset=
200
Object detector
Both
roi-bottom-offset
Offset of the RoI from the bottom of the frame. Only objects within the RoI are output.
Integer, ≥0
roi-bottom-offset=
200
Object detector
Both
detected-min-w
Minimum width in pixels of detected objects to be output by the GIE
Integer, ≥0
detected-min-w=
64
Object detector
Both
detected-min-h
Minimum height in pixels of detected objects to be output by the GIE
Integer, ≥0
detected-min-h=
64
Object detector
Both
detected-max-w
Maximum width in pixels of detected objects to be output by the GIE
Integer, ≥0
detected-max-w=200
0 disables the property
Object detector
Both
detected-max-h
Maximum height in pixels of detected objects to be output by the GIE
Integer, ≥0
detected-max-h=200
0 disables the property
Object detector
Both
topk
Keep only top K objects with highest detection scores.
Integer, ≥0. -1 to disable
topk=10
Object detector
Both
Note
UFF model support is removed from TRT 10.3.
Gst Properties#
The values set through Gst properties override the values of properties in the configuration file. The application does this for certain properties that it needs to set programmatically. The following table describes the Gst-nvinfer plugin’s Gst properties.
Property |
Meaning |
Type and Range |
Example notes |
---|---|---|---|
config-file-path |
Absolute pathname of configuration file for the Gst-nvinfer element |
String |
config-file-path=config_infer_primary.txt |
process-mode |
Infer Processing Mode 1=Primary Mode 2=Secondary Mode |
Integer, 1 or 2 |
process-mode=1 |
unique-id |
Unique ID identifying metadata generated by this GIE |
Integer, | 0 to 4,294,967,295 |
unique-id=1 |
infer-on-gie-id |
See operate-on-gie-id in the configuration file table |
Integer, 0 to 4,294,967,295 |
infer-on-gie-id=1 |
operate-on-class-ids |
See operate-on-class-ids in the configuration file table |
An array of colon- separated integers (class-ids) |
operate-on-class-ids=1:2:4 |
filter-out-class-ids |
See filter-out-class-ids in the configuration file table |
Semicolon delimited integer array |
filter-out-class-ids=1;2 |
model-engine-file |
Absolute pathname of the pre-generated serialized engine file for the mode |
String |
model-engine-file=model_b1_fp32.engine |
batch-size |
Number of frames/objects to be inferred together in a batch |
Integer, 1 – 4,294,967,295 |
batch-size=4 |
Interval |
Number of consecutive batches to be skipped for inference |
Integer, 0 to 32 |
interval=0 |
gpu-id |
Device ID of GPU to use for pre-processing/inference (dGPU only) |
Integer, 0-4,294,967,295 |
gpu-id=1 |
raw-output-file-write |
Pathname of raw inference output file |
Boolean |
raw-output-file-write=1 |
raw-output-generated-callback |
Pointer to the raw output generated callback function |
Pointer |
Cannot be set through gst-launch |
raw-output-generated-userdata |
Pointer to user data to be supplied with raw-output-generated-callback |
Pointer |
Cannot be set through gst-launch |
output-tensor-meta |
Indicates whether to attach tensor outputs as meta on GstBuffer. |
Boolean |
output-tensor-meta=0 |
output-instance-mask |
Gst-nvinfer attaches instance mask output in object metadata. |
Boolean |
output-instance-mask=1 |
input-tensor-meta |
Use preprocessed input tensors attached as metadata instead of preprocessing inside the plugin |
Boolean |
input-tensor-meta=1 |
crop-objects-to-roi-boundary |
Clip the object bounding boxes to fit within the specified ROI boundary |
Boolean |
crop-objects-to-roi-boundary=1 |
Clustering algorithms supported by nvinfer#
cluster-mode = 0 | GroupRectangles#
GroupRectangles is a clustering algorithm from OpenCV library which clusters rectangles of similar size and location using the rectangle equivalence criteria. Link to API documentation - https://docs.opencv.org/3.4/d5/d54/group__objdetect.html#ga3dba897ade8aa8227edda66508e16ab9
cluster-mode = 1 | DBSCAN#
Density-based spatial clustering of applications with noise or DBSCAN is a clustering algorithm which which identifies clusters by checking if a specific rectangle has a minimum number of neighbors in its vicinity defined by the eps value. The algorithm further normalizes each valid cluster to a single rectangle which is outputted as valid bounding box if it has a confidence greater than that of the threshold.
cluster-mode = 2 | NMS#
Non maximum suppression or NMS is a clustering algorithm which filters overlapping rectangles based on a degree of overlap(IOU) which is used as threshold. Rectangles with the highest confidence score is first preserved while the rectangles which overlap greater than the threshold are removed iteratively.
cluster-mode = 3 | Hybrid#
Hybrid clustering algorithm is a method which uses both DBSCAN and NMS algorithms in a two step process. DBSCAN is first applied to form unnormalized clusters in proposals whilst removing the outliers. NMS is later applied on these clusters to select the final rectangles for output.
cluster-mode=4 | No clustering#
No clustering is applied and all the bounding box rectangle proposals are returned as it is.
Tensor Metadata#
The Gst-nvinfer plugin can attach raw output tensor data generated by a TensorRT inference engine as metadata. It is added as an NvDsInferTensorMeta in the frame_user_meta_list
member of NvDsFrameMeta for primary (full frame) mode, or in the obj_user_meta_list
member of NvDsObjectMeta for secondary (object) mode.
To read or parse inference raw tensor data of output layers#
Enable property output-tensor-meta or enable the same-named attribute in the configuration file for the Gst-nvinfer plugin.
When operating as primary GIE,` NvDsInferTensorMeta` is attached to each frame’s (each NvDsFrameMeta object’s)
frame_user_meta_list
. When operating as secondary GIE, NvDsInferTensorMeta is attached to each each NvDsObjectMeta object’sobj_user_meta_list
.
Metadata attached by Gst-nvinfer can be accessed in a GStreamer pad probe attached downstream from the Gst-nvinfer instance.
The NvDsInferTensorMeta object’s metadata type is set to
NVDSINFER_TENSOR_OUTPUT_META
. To get this metadata you must iterate over the NvDsUserMeta user metadata objects in the list referenced byframe_user_meta_list
orobj_user_meta_list
.
For more information about Gst-infer tensor metadata usage, see the source code in sources/apps/sample_apps/deepstream_infer_tensor_meta-test.cpp
, provided in the DeepStream SDK samples.
Segmentation Metadata#
The Gst-nvinfer plugin attaches the output of the segmentation model as user meta in an instance of NvDsInferSegmentationMeta
with meta_type set to NVDSINFER_SEGMENTATION_META
. The user meta is added to the frame_user_meta_list
member of NvDsFrameMeta
for primary (full frame) mode, or the obj_user_meta_list
member of NvDsObjectMeta
for secondary (object) mode.
For guidance on how to access user metadata, see User/Custom Metadata Addition inside NvDsBatchMeta and Tensor Metadata sections.