GStreamer Plugin Details

Gst-nvinfer

The Gst-nvinfer plugin does inferencing on input data using NVIDIA® TensorRT™.
The plugin accepts batched NV12/RGBA buffers from upstream. The NvDsBatchMeta structure must already be attached to the Gst Buffers.
The low-level library (libnvds_infer) operates on any of INT8 RGB, BGR, or GRAY data with dimension of Network Height and Network Width.
The Gst-nvinfer plugin performs transforms (format conversion and scaling), on the input frame based on network requirements, and passes the transformed data to the low-level library.
The low-level library preprocesses the transformed frames (performs normalization and mean subtraction) and produces final float RGB/BGR/GRAY planar data which is passed to the TensorRT engine for inferencing. The output type generated by the low-level library depends on the network type.
The pre-processing function is:
Where:
x is the input pixel value. It is an int8 with range [0,255].
mean is the corresponding mean value, read either from the mean file or as offsets[c], where c is the channel to which the input pixel belongs, and offsets is the array specified in the configuration file. It is a float.
net-scale-factor is the pixel scaling factor specified in the configuration file. It is a float.
y is the corresponding output pixel value. It is a float.
Gst-nvinfer currently works on the following type of networks:
Multi-class object detection
Multi-label classification
Segmentation (semantic)
Instance Segmentation
The Gst-nvinfer plugin can work in two modes:
Primary mode: Operates on full frames
Secondary mode: Operates on objects added in the meta by upstream components
When the plugin is operating as a secondary classifier along with the tracker, it tries to improve performance by avoiding re-inferencing on the same objects in every frame. It does this by caching the classification output in a map with the object’s unique ID as the key. The object is inferred upon only when it is first seen in a frame (based on its object ID) or when the size (bounding box area) of the object increases by 20% or more. This optimization is possible only when the tracker is added as an upstream element.
Detailed documentation of the TensorRT interface is available at:
The plugin supports the IPlugin interface for custom layers. Refer to section IPlugin Interface for details.
The plugin also supports the interface for custom functions for parsing outputs of object detectors and initialization of non-image input layers in cases where there is more than one input layer.
Refer to sources/includes/nvdsinfer_custom_impl.h for the custom method implementations for custom models.
A screenshot of a cell phone Description automatically generated
Downstream components receive a Gst Buffer with unmodified contents plus the metadata created from the inference output of the Gst-nvinfer plugin.
The plugin can be used for cascaded inferencing. That is, it can perform primary inferencing directly on input data, then perform secondary inferencing on the results of primary inferencing, and so on. See the sample application deepstream-test2 for more details.

Inputs and Outputs

This section summarizes the inputs, outputs, and communication facilities of the Gst-nvinfer plugin.
Inputs
Gst Buffer
NvDsBatchMeta (attaching NvDsFrameMeta)
Caffe Model and Caffe Prototxt
ONNX
UFF file
TLT Encoded Model and Key
Offline: Supports engine files generated by Transfer Learning Toolkit SDK Model converters
Layers: Supports all layers supported by TensorRT, see:
Control parameters: Gst-nvinfer gets control parameters from a configuration file. You can specify this by setting the property config-file-path. For details, see Gst-nvinfer File Configuration Specifications. Other control parameters that can be set through GObject properties are:
Batch size
Inference interval
Attach inference tensor outputs as buffer metadata
Attach instance mask output as in object metadata
The parameters set through the GObject properties override the parameters in the Gst-nvinfer configuration file.
Outputs
Gst Buffer
Depending on network type and configured parameters, one or more of:
NvDsObjectMeta
NvDsClassifierMeta
NvDsInferSegmentationMeta
NvDsInferTensorMeta

Features

The following table summarizes the features of the plugin.
Features of the Gst-nvinfer plugin
Feature
Description
Release
Explicit Full Dimension Network Support
DS 5.0
Non-maximum Suppression (NMS)
New bounding box clustering algorithm.
DS 5.0
On-the-fly model update (Engine file only)
Update the model-engine-file on-the-fly in a running pipeline.
DS 5.0
Configurable frame scaling params
Configurable options to select the compute hardware and the filter to use while scaling frame/object crops to network resolution
DS 5.0
Transfer-Learning-Toolkit encoded model support
DS 4.0
Gray input model support
Support for models with single channel gray input
DS 4.0
Tensor output as meta
Raw tensor output is attached as meta data to Gst Buffers and flowed through the pipeline
DS 4.0
Segmentation model
Supports segmentation model
DS 4.0
Maintain input aspect ratio
Configurable support for maintaining aspect ratio when scaling input frame to network resolution
DS 4.0
Custom cuda engine creation interface
Interface for generating CUDA engines from TensorRT INetworkDefinition and IBuilder APIs instead of model files
DS 4.0
Caffe Model support
DS 2.0
UFF Model support
DS 3.0
ONNX Model support
DS 3.0
Multiple modes of operation
Support for cascaded inferencing
DS 2.0
Asynchronous mode of operation for secondary inferencing
Infer asynchronously for secondary classifiers
DS 2.0
Grouping using CV::Group rectangles
For detector bounding box clustering
DS 2.0
Configurable batch-size processing
User can configure batch size for processing
DS 2.0
No Restriction on number of output blobs
Supports any number of output blobs
DS 3.0
Configurable number of detected classes (detectors)
Supports configurable number of detected classes
DS 3.0
Support for Classes: configurable (> 32)
Supports any number of classes
DS 3.0
Application access to raw inference output
Application can access inference output buffers for user specified layer
DS 3.0
Support for single shot detector (SSD)
DS 3.0
Secondary GPU Inference Engines (GIEs) operate as detector on primary bounding box
Supports secondary inferencing as detector
DS 2.0
Multiclass secondary support
Supports multiple classifier network outputs
DS 2.0
Grouping using DBSCAN
For detector bounding box clustering
DS 3.0
Loading an external lib containing IPlugin implementation for custom layers (IPluginCreator & IPluginFactory)
Supports loading (dlopen()) a library containing IPlugin implementation for custom layers
DS 3.0
Multi GPU
Select GPU on which we want to run inference
DS 2.0
Detection width height configuration
Filter out detected objects based on min/max object size threshold
DS 2.0
Allow user to register custom parser
Supports final output layer bounding box parsing for custom detector network
DS 2.0
Bounding box filtering based on configurable object size
Supports inferencing in secondary mode objects meeting min/max size threshold
DS 2.0
Configurable operation interval
Interval for inferencing (number of batched buffers skipped)
DS 2.0
Select Top and bottom regions of interest (RoIs)
Removes detected objects in top and bottom areas
DS 2.0
Operate on Specific object type (Secondary mode)
Process only objects of define classes for secondary inferencing
DS 2.0
Configurable blob names for parsing bounding box (detector)
Support configurable names for output blobs for detectors
DS 2.0
Allow configuration file input
Support configuration file as input (mandatory in DS 3.0)
DS 2.0
Allow selection of class id for operation
Supports secondary inferencing based on class ID
DS 2.0
Support for Full Frame Inference: Primary as a classifier
Can work as classifier as well in primary mode
DS 2.0
Multiclass secondary support
Support multiple classifier network outputs
DS 2.0
Secondary GIEs operate as detector on primary bounding box
Support secondary inferencing as detector
DS 2.0
Supports FP16, FP32 and INT8 models
FP16 and INT8 are platform dependent
DS 2.0
Supports TensorRT Engine file as input
 
DS 2.0
Inference input layer initialization
Initializing non-video input layers in case of more than one input layers
DS 3.0
Support for FasterRCNN
DS 3.0
Support for Yolo detector (YoloV3/V3-tiny/V2/V2-tiny)
DS 4.0
Support for yolov3-spp detector
DS 5.0
Support Instance segmentation with MaskRCNN
Support for instance segmentation using MaskRCNN. It includes output parser and attach mask in object metadata.
DS 5.0

Gst-nvinfer File Configuration Specifications

The Gst-nvinfer configuration file uses a “Key File” format described in:
The [property] group configures the general behavior of the plugin. It is the only mandatory group.
The [class-attrs-all] group configures detection parameters for all classes.
The [class-attrs-<class-id>] group configures detection parameters for a class specified by <class-id>. For example, the [class-attrs-23] group configures detection parameters for class ID 23. This type of group has the same keys as [class-attrs-all].
The following two tables respectively describe the keys supported for [property] groups and [class-attrs-…] groups.
Gst-nvinfer plugin, [property] group, supported keys
Property
Meaning
Type and Range
Example
Notes
Network Types/ Applicable to GIEs (Primary/Secondary)
num-detected-classes
Number of classes detected by the network
Integer, >0
num-detected-classes=91
Detector
Both
net-scale-factor
Pixel normalization factor
Float, >0.0
net-scale-factor=0.031
All
Both
model-file
Pathname of the caffemodel file. Not required if model-engine-file is used
String
model-file=/home/ubuntu/model.caffemodel
All
Both
proto-file
Pathname of the prototxt file. Not required if model-engine-file is used
String
proto-file=/home/ubuntu/model.prototxt
All
Both
int8-calib-file
Pathname of the INT8 calibration file for dynamic range adjustment with an FP32 model
String
int8-calib-file=/home/ubuntu/int8_calib
All
Both
batch-size
Number of frames or objects to be inferred together in a batch
Integer, >0
batch-size=30
All
Both
model-engine-file
Pathname of the serialized model engine file
String
model-engine-file=/home/ubuntu/model.engine
All
Both
uff-file
Pathname of the UFF model file
String
uff-file=/home/ubuntu/model.uff
All
Both
onnx-file
Pathname of the ONNX model file
String
onnx-file=/home/ubuntu/model.onnx
All
Both
enable-dbscan
Indicates whether to use DBSCAN or the OpenCV groupRectangles() function for grouping detected objects.
DEPRECATED. Use cluster-mode instead.
Boolean
enable-dbscan=1
Detector
Both
labelfile-path
Pathname of a text file containing the labels for the model
String
labelfile-path=/home/ubuntu/model_labels.txt
Detector & classifier
Both
mean-file
Pathname of mean data file (PPM format)
String
mean-file=/home/ubuntu/model_meanfile.ppm
All
Both
gie-unique-id
Unique ID to be assigned to the GIE to enable the application and other elements to identify detected bounding boxes and labels
Integer, >0
gie-unique-id=2
All
Both
operate-on-gie-id
Unique ID of the GIE on whose metadata (bounding boxes) this GIE is to operate on
Integer, >0
operate-on-gie-id=1
All
Both
operate-on-class-ids
Class IDs of the parent GIE on which this GIE is to operate on
Semicolon delimited integer array
operate-on-class-ids=1;2
Operates on objects with class IDs 1, 2 generated by parent GIE
All
Both
interval
Specifies the number of consecutive batches to be skipped for inference
Integer, >0
interval=1
All
Primary
input-object-min-width
Secondary GIE infers only on objects with this minimum width
Integer, ≥0
input-object-min-width=40
All
Secondary
input-object-min-height
Secondary GIE infers only on objects with this minimum height
Integer, ≥0
input-object-min-height=40
All
Secondary
input-object-max-width
Secondary GIE infers only on objects with this maximum width
Integer, ≥0
input-object-max-width=256
0 disables the threshold
All
Secondary
input-object-max-height
Secondary GIE infers only on objects with this maximum height
Integer, ≥0
input-object-max-height=256
0 disables the threshold
All
Secondary
uff-input-dims
DEPRECATED. Use infer-dims and uff-input-order instead.
 
Dimensions of the UFF model
 
channel;
height;
width;
input-order
All integers, ≥0
input-dims=3;224;224;0
Possible values for input-order are:
0: NCHW
1: NHWC
All
Both
network-mode
Data format to be used by inference
Integer
0: FP32
1: INT8
2: FP16
network-mode=0
All
Both
offsets
Array of mean values of color components to be subtracted from each pixel. Array length must equal the number of color components in the frame. The plugin multiplies mean values by net-scale-factor.
Semicolon delimited float array,
all values ≥0
offsets=77.5;21.2;11.8
All
Both
output-blob-names
Array of output layer names
Semicolon delimited string array
For detector:
output-blob-names=coverage;bbox
For multi-label classifiers:
output-blob-names=coverage_attrib1;coverage_attrib2
All
Both
parse-bbox-instance-mask-func-name
Name of the custom instance segmentation parsing function. It is mandatory for instance segmentation network as there is no internal function.
String
parse-bbox-instance-mask-func-name=NvDsInferParseCustomMrcnnTLT
Instance Segmentation Primary
custom-lib-path
Absolute pathname of a library containing custom method implementations for custom models
String
custom-lib-path=/home/ubuntu/libresnet_custom_impl.so
All
Both
model-color-format
Color format required by the model
Integer
0: RGB
1: BGR
2: GRAY
model-color-format=0
All
Both
classifier-async-mode
Enables inference on detected objects and asynchronous metadata attachments. Works only when tracker-ids are attached. Pushes buffer downstream without waiting for inference results. Attaches metadata after the inference results are available to next Gst Buffer in its internal queue.
Boolean
classifier-async-mode=1
Classifier
Secondary
process-mode
Mode (primary or secondary) in which the element is to operate on
Integer
1=Primary
2=Secondary
gie-mode=1
All
Both
classifier-threshold
Minimum threshold label probability. The GIE outputs the label having the highest probability if it is greater than this threshold
Float, ≥0
classifier-threshold=0.4
Classifier
Both
uff-input-blob-name
Name of the input blob in the UFF file
String
uff-input-blob-name=Input_1
All
Both
secondary-reinfer-interval
Re-inference interval for objects, in frames
Integer, ≥0
secondary-reinfer-interval=15
Classifier
Secondary
output-tensor-meta
Gst-nvinfer attaches raw tensor output as Gst Buffer metadata.
Boolean
output-tensor-meta=1
All
Both
output-instance-mask
Gst-nvinfer attaches instance mask output in object metadata.
Boolean
output-instance-mask=1
Instance Segmentation Primary
enable-dla
Indicates whether to use the DLA engine for inferencing.
Note: DLA is supported only on NVIDIA® Jetson AGX Xavier™. Currently work in progress.
Boolean
enable-dla=1
All
Both
use-dla-core
DLA core to be used.
Note: Supported only on Jetson AGX Xavier. Currently work in progress.
Integer, ≥0
use-dla-core=0
All
Both
network-type
Type of network
Integer
0: Detector
1: Classifier
2: Segmentation
3: Instance Segmentation
network-type=1
All
Both
maintain-aspect-ratio
Indicates whether to maintain aspect ratio while scaling input.
Boolean
maintain-aspect-ratio=1
All
Both
parse-classifier-func-name
Name of the custom classifier output parsing function. If not specified, Gst-nvinfer uses the internal parsing function for softmax layers.
String
parse-classifier-func-name=parse_bbox_softmax
Classifier
Both
custom-network-config
Pathname of the configuration file for custom networks available in the custom interface for creating CUDA engines.
String
custom-network-config=/home/ubuntu/network.config
All
Both
tlt-encoded-model
Pathname of the Transfer Learning Toolkit (TLT) encoded model.
String
tlt-encoded-model=/home/ubuntu/model.etlt
All
Both
tlt-model-key
Key for the TLT encoded model.
String
tlt-model-key=abc
All
Both
segmentation-threshold
Confidence threshold for the segmentation model to output a valid class for a pixel. If confidence is less than this threshold, class output for that pixel is −1.
Float, ≥0.0
segmentation-threshold=0.3
Segmentation, Instance segmentation
Both
workspace-size
Workspace size to be used by the engine, in MB
Integer, >0
workspace-size=45
All
Both
force-implicit-batch-dim
When a network supports both implicit batch dimension and full dimension, force the implicit batch dimension mode.
Boolean
force-implicit-batch-dim=1
All
Both
infer-dims
Binding dimensions to set on the image input layer.
channel;
height;
width
infer-dims=3;224;224
All
Both
uff-input-order
UFF input layer order
Integer
0: NCHW
1: NHWC
2: NC
uff-input-order=1
All
Both
engine-create-func-name
Name of the custom TensorRT CudaEngine creation function. Refer to the “Custom Model Implementation Interface” section for details
String
engine-create-func-name=NvDsInferYoloCudaEngineGet
All
Both
cluster-mode
Clustering algorithm to use. Refer to the next table for configuring the algorithm specific parameters.
Integer
0: OpenCV groupRectangles()
1: DBSCAN
2: Non Maximum Suppression
3: DBSCAN + NMS Hybrid
4:No clustering
cluster-mode=2
Detector
Both
cluster-mode=4 for instance segmentation
filter-out-class-ids
Filter out detected objects belonging to specified class-ids
Semicolon delimited integer array
filter-out-class-ids=1;2
Detector
Both
scaling-filter
The filter to use for scaling frames / object crops to network resolution
Integer, refer to enum NvBufSurfTransform_Inter in nvbufsurftransform.h for valid values
scaling-filter=1
All
Both
scaling-compute-hw
Compute hardware to use for scaling frames / object crops to network resolution
Integer
0: Platform default – GPU (dGPU), VIC (Jetson)
1: GPU
2: VIC (Jetson only)
scaling-compute-hw=2
All
Both
output-io-formats
Specifies the data type and order for bound output layers. For layers not specified, defaults to FP32 and CHW
Semi-colon separated list of format.
<output-layer1-name>:<data-type>:<order>;<output-layer2-name>:<data-type>:<order>
 
data-type should be one of
[fp32, fp16, int32, int8]
 
order should be one of [chw, chw2, chw4, hwc8, chw16, chw32]
output-io-formats=conv2d_bbox:fp32:chw;conv2d_cov/Sigmoid:fp32:chw
All
Both
Layer-device-precision
Specifies the device type and precision for any layer in the network
Semi-colon separated list of format.
<layer1-name>:<precision>:<device-type>;<layer2-name>:<precision>:<device-type>;
 
precision should be one of
[fp32, fp16, int8]
 
Device-type should be one of [gpu, dla]
layer-device-precision=output_cov/Sigmoid:fp32:gpu;output_bbox/BiasAdd:fp32:gpu;
All
Both
 
Gst-nvinfer plugin, [class-attrs-...] groups, supported keys
Name
Description
Type and Range
Example
Notes
(Primary/Secondary)
threshold
Detection threshold
Float, ≥0
threshold=0.5
Object detector
Both
pre-cluster-threshold
Detection threshold to be applied prior to clustering operation
Float, ≥0
pre-cluster-threshold=0.5
Object detector
Both
Post-cluster-threshold
Detection threshold to be applied post clustering operation
Float, ≥0
post-cluster-threshold=0.5
Object detector
Both
eps
Epsilon values for OpenCV grouprectangles() function and DBSCAN algorithm
Float, ≥0
eps=0.2
Object detector
Both
group-threshold
Threshold value for rectangle merging for OpenCV grouprectangles() function
Integer, ≥0
group-threshold=1
0 disables the clustering functionality
Object detector
Both
minBoxes
Minimum number of points required to form a dense region for DBSCAN algorithm
Integer, ≥0
minBoxes=1
0 disables the clustering functionality
Object detector
Both
dbscan-min-score
Minimum sum of confidence of all the neighbors in a cluster for it to be considered a valid cluster.
Float, ≥0
dbscan-min-score=0.7
Object detector
Both
nms-iou-threshold
Maximum IOU score between two proposals after which the proposal with the lower confidence will be rejected.
Float, ≥0
nms-iou-threshold=0.2
Object detector
Both
roi-top-offset
Offset of the RoI from the top of the frame. Only objects within the RoI are output.
Integer, ≥0
roi-top-offset=200
Object detector
Both
roi-bottom-offset
Offset of the RoI from the bottom of the frame. Only objects within the RoI are output.
Integer, ≥0
roi-bottom-offset=200
Object detector
Both
detected-min-w
Minimum width in pixels of detected objects to be output by the GIE
Integer, ≥0
detected-min-w=64
Object detector
Both
detected-min-h
Minimum height in pixels of detected objects to be output by the GIE
Integer, ≥0
detected-min-h=64
Object detector
Both
detected-max-w
Maximum width in pixels of detected objects to be output by the GIE
Integer, ≥0
detected-max-w=200
0 disables the property
Object detector
Both
detected-max-h
Maximum height in pixels of detected objects to be output by the GIE
Integer, ≥0
detected-max-h=200
0 disables the property
Object detector
Both
topk
Keep only top K objects with highest detection scores.
Integer, ≥0.
-1 to disable
topk=10
Object detector
Both

Gst Properties

The values set through Gst properties override the values of properties in the configuration file. The application does this for certain properties that it needs to set programmatically.
The following table describes the Gst-nvinfer plugin’s Gst properties.
Gst-nvinfer plugin, Gst properties
Property
Meaning
Type and Range
Example
Notes
config-file-path
Absolute pathname of configuration file for the Gst-nvinfer element
String
config-file-path=config_infer_primary.txt
process-mode
Infer Processing Mode
1=Primary Mode
2=Secondary Mode
Integer, 1 or 2
process-mode=1
unique-id
Unique ID identifying metadata generated by this GIE
Integer,
0 to 4,294,967,295
unique-id=1
infer-on-gie-id
See operate-on-gie-id in the configuration file table
Integer,
0 to 4,294,967,295
infer-on-gie-id=1
operate-on-class-ids
See operate-on-class-ids in the configuration file table
An array of colon- separated integers (class-ids)
operate-on-class-ids=1:2:4
filter-out-class-ids
See filter-out-class-ids in the configuration file table
Semicolon delimited integer array
filter-out-class-ids=1;2
model-engine-file
Absolute pathname of the pre-generated serialized engine file for the mode
String
model-engine-file=model_b1_fp32.engine
batch-size
Number of frames/objects to be inferred together in a batch
Integer,
1 – 4,294,967,295
batch-size=4
Interval
Number of consecutive batches to be skipped for inference
Integer, 0 to 32
interval=0
gpu-id
Device ID of GPU to use for pre-processing/inference (dGPU only)
Integer,
0-4,294,967,295
gpu-id=1
raw-output-file-write
Pathname of raw inference output file
Boolean
raw-output-file-write=1
raw-output-generated-callback
Pointer to the raw output generated callback function
Pointer
Cannot be set through gst-launch
raw-output-generated-userdata
Pointer to user data to be supplied with raw-output-generated-callback
Pointer
Cannot be set through gst-launch
output-tensor-meta
Indicates whether to attach tensor outputs as meta on GstBuffer.
Boolean
output-tensor-meta=0
output-instance-mask
Gst-nvinfer attaches instance mask output in object metadata.
Boolean
output-instance-mask=1

Tensor Metadata

The Gst-nvinfer plugin can attach raw output tensor data generated by a TensorRT inference engine as metadata. It is added as an NvDsInferTensorMeta in the frame_user_meta_list member of NvDsFrameMeta for primary (full frame) mode, or in the obj_user_meta_list member of NvDsObjectMeta for secondary (object) mode.
To read or parse inference raw tensor data of output layers
1. Enable property output-tensor-meta or enable the same-named attribute in the configuration file for the Gst-nvinfer plugin.
2. When operating as primary GIE, NvDsInferTensorMeta is attached to each frame’s (each NvDsFrameMeta object’s) frame_user_meta_list. When operating as secondary GIE, NvDsInferTensorMeta is attached to each each NvDsObjectMeta object’s obj_user_meta_list.
Metadata attached by Gst-nvinfer can be accessed in a GStreamer pad probe attached downstream from the Gst-nvinfer instance.
3. The NvDsInferTensorMeta object’s metadata type is set to NVDSINFER_TENSOR_OUTPUT_META. To get this metadata you must iterate over the NvDsUserMeta user metadata objects in the list referenced by frame_user_meta_list or obj_user_meta_list.
For more information about Gst-infer tensor metadata usage, see the source code in sources/apps/sample_apps/deepstream_infer_tensor_meta-test.cpp, provided in the DeepStream SDK samples.

Segmentation Metadata

The Gst-nvinfer plugin attaches the output of the segmentation model as user meta in an instance of NvDsInferSegmentationMeta with meta_type set to NVDSINFER_SEGMENTATION_META. The user meta is added to the frame_user_meta_list member of NvDsFrameMeta for primary (full frame) mode, or the obj_user_meta_list member of NvDsObjectMeta for secondary (object) mode.
For guidance on how to access user metadata, see User/Custom Metadata Addition Inside NvDsMatchMeta and Tensor Metadata, above.

Gst-nvinferserver

The Gst-nvinferserver plugin does inferencing on input data using NVIDIA® Triton Inference Server (previously called TensorRT Inference Server) Release 1.12.0 corresponding to NGC container 20.03. Refer to https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html for Triton Inference Server (Triton) documentation.
The plugin accepts batched NV12/RGBA buffers from upstream. The NvDsBatchMeta structure must already be attached to the Gst Buffers.
The low-level library (libnvds_infer_server) operates on any of NV12 or RGBA buffers.
The Gst-nvinferserver plugin passes the input batched buffers to the low-level library and waits for the results to be available. Meanwhile, it keeps queuing input buffers to the low-level library as they are received. Once the results are available from the low-level library, the plugin translates and attaches the results back in to Gst-buffer for downstream plugins.
The low-level library preprocesses the transformed frames (performs color conversion and scaling, normalization and mean subtraction) and produces final FP32/FP16/INT8/UINT8/INT16/UINT16/INT32/UINT32 RGB/BGR/GRAY planar/packed data which is passed to the Triton for inferencing. The output type generated by the low-level library depends on the network type.
The pre-processing function is:
Where:
x is the input pixel value. It is an uint8 with range [0,255].
mean is the corresponding mean value, read either from the mean file or as offsets[c], where c is the channel to which the input pixel belongs, and offsets is the array specified in the configuration file. It is a float.
net-scale-factor is the pixel scaling factor specified in the configuration file. It is a float.
y is the corresponding output pixel value. It can be of type float / half / int8 / uint8 / int16 / uint16 / int32 / uint32.
Take specific example for uint8 to int8 conversion. set netscalefactor = 1.0 and mean = [128, 128, 128]. then the function is as such
Gst-nvinferserver currently works on the following type of networks:
Multi-class object detection
Multi-label classification
Segmentation
The Gst-nvinferserver plugin can work in two modes:
Primary mode: Operates on full frames
Secondary mode: Operates on objects added in the meta by upstream components
When the plugin is operating as a secondary classifier in async mode along with the tracker, it tries to improve performance by avoiding re-inferencing on the same objects in every frame. It does this by caching the classification output in a map with the object’s unique ID as the key. The object is inferred upon only when it is first seen in a frame (based on its object ID) or when the size (bounding box area) of the object increases by 20% or more. This optimization is possible only when the tracker is added as an upstream element.
Detailed documentation of the Triton Inference Server is available at:
The plugin supports Triton features along with multiple deep-learning frameworks such as TensorRT, TensorFlow (GraphDef / SavedModel), ONNX and PyTorch on Tesla platforms. On Jetson, it also supports TensorRT and TensorFlow (GraphDef / SavedModel). Tensorflow and ONNX can be configured with TensorRT acceleration. For details, see framework-Specific Optimization.
The plugin requires a configurable model repository root directory path where all the models need to reside. All the plugin instances in a single process must share the same model root. For details, see Model Repository. Each model also needs a specific config.pbtxt file in its subdirectory. For details, see Model Configuration.
The plugin supports Triton ensemble mode in the case users need do preprocessing or postprocessing with Triton custom backend.
The plugin also supports the interface for custom functions for parsing outputs of object detectors, classifiers, and initialization of non-image input layers in cases where there is more than one input layer.
Refer to sources/includes/nvdsinfer_custom_impl.h for the custom method implementations for custom models.
A screenshot of a cell phone Description automatically generated
Downstream components receive a Gst Buffer with unmodified contents plus the metadata created from the inference output of the Gst-nvinferserver plugin.
The plugin can be used for cascaded inferencing. That is, it can perform primary inferencing directly on input data, then perform secondary inferencing on the results of primary inferencing, and so on. This is similar with Gst-nvinfer, see more details in Gst-nvinfer.

Inputs and Outputs

This section summarizes the inputs, outputs, and communication facilities of the Gst-nvinferserver plugin.
Inputs
Gst Buffer
NvDsBatchMeta (attaching NvDsFrameMeta)
Model repository directory path (model_repo.root)
Runtime model file with config.pbtxt file in model repository
Control parameters:
Gst-nvinferserver gets control parameters from a configuration file. You can specify this by setting the property config-file-path. For details, see Gst-nvinferserver File Configuration Specifications. Other control parameters that can be set through GObject properties are:
Batch size
Process mode
Unique id
Inference on GIE id and operate on class ids [secondary mode only]
Inference interval
Raw output generated callback function
The parameters set through the GObject properties override the parameters in the Gst-nvinferserver configuration file.
Outputs
Gst Buffer
Depending on network type and configured parameters, one or more of:
NvDsObjectMeta
NvDsClassifierMeta
NvDsInferSegmentationMeta
NvDsInferTensorMeta

Features

The following table summarizes the features of the plugin.
Features of the Gst-nvinferserver plugin
Feature
dGPU
Jetson
Release
Gst-nvinferserver Running on Host
No
Yes
DS 5.0
Running on Docker Image
Yes
No
DS 5.0
DS Preprocessing: Network input format: RGB/BGR/Gray
Yes
Yes
DS 5.0
DS Preprocessing: Network input data types FP32/FP16/UINT8/INT8/UINT16/INT16/UINT32/INT32
Yes
Yes
DS 5.0
DS Preprocessing: Network input tensor orders
NCHW / NHWC
Yes
Yes
DS 5.0
Mem: Cuda(GPU) buf-sharing for Input Tensors
Yes
Yes
DS 5.0
Mem: Cuda Memory (GPU / CPU-pinned) for output tensors
Yes
Yes
DS 5.0
Backend: TensorRT runtime (plan engine file)
Yes
Yes
DS 5.0
Backend: Tensorflow Runtime CPU/GPU (graphdef/savedmodel)
Yes
Yes
DS 5.0
Backend: Tensorflow Runtime with TF-TRT acceleration
Yes
Yes
DS 5.0
Backend: ONNX Runtime
Yes
No
DS 5.0
Backend: ONNX Runtime with ONNX-TRT acceleration
Yes
No
DS 5.0
Backend: Pytorch Runtime
Yes
No
DS 5.0
[deprecated, will be removed in future version]
Backend: Caffe2 Runtime
Yes
(very few tests)
No
DS 5.0
Postprocessing: DS Detection / Classification/ Segmentation
Yes
Yes
DS 5.0
Postprocessing: DS Detection cluster method: NMS / GroupRectangle / DBSCan / None
Yes
Yes
DS 5.0
Postprocessing: custom parsing (NvDsInferParseCustomTfSSD)
Yes
Yes
DS 5.0
Postprocessing: Triton native classification
Yes
Yes
DS 5.0
Triton Ensemble Mode (Triton preproc/postproc) with specified media-format (RGB/BGR/Gray) with Cuda GPU buffer as inputs
Yes
Yes
DS 5.0
Postprocessing: Attach Triton raw tensor output in NvDsInferTensorMeta for downstream or application postprocessing
Yes
Yes
DS 5.0
deepstream-app: pipeline works with PGIE / SGIE / nvtracker
Yes
Yes
DS 5.0
Sample App: deepstream-segmentation-test / deepstream-infer-tensor-meta-test
Yes
Yes
DS 5.0
Basic LSTM features on single batch and single stream (beta version, config file might be changed in future version)
Yes
Yes
DS 5.0

Gst-nvinferserver File Configuration Specifications

The Gst-nvinferserver configuration file uses prototxt format described in:
https://developers.google.com/protocol-buffers
The protobuf message structures of this configuration file are defined by nvdsinferserver_plugin.proto and nvdsinferserver_config.proto. All the basic data-type values are set to 0 or false from protobuf’s guide. Map, arrays and oneof are set to empty by default. See more details for each message definition.
The message PluginControl in nvdsinferserver_plugin.proto is the entry point for this config-file.
The message InferenceConfig configures the low-level settings for libnvds_infer_server.
The message PluginControl::InputControl configures the input buffers, objects filtering policy for model inference.
The message PluginControl::OutputControl configures inference output policy for detections and raw tensor metadata.
The message BackendParams configures backend input/output layers and Triton settings in InferenceConfig.
The message PreProcessParams configures network preprocessing information in InferenceConfig
The message PostProcessParams configures the output tensor parsing methods such as detection, classification, segmentation and others in InferenceConfig.
There are also other messages (e.g. CustomLib, ExtraControl) and enum types (e.g. MediaFormat, TensorOrder, ...) defined in the proto file for miscellaneous settings for InferenceConfig and PluginControl.
Gst-nvinferserver plugin, message PluginControl definition details
Fields
Meaning
Type and Range
Example
Notes
Network Types/ Applicable to GIEs (Primary/Secondary)
infer_config
Low-level libnvds_infer_server inference configuration settings
InferenceConfig
infer_config { … }
see details in InferenceConfig
All
Both
input_control
Control plugin input buffers, objects filtering policy for inference
PluginControl ::InputControl
input_control{
process_mode: PROCESS_MODE_FULL_FRAME
}
see details in InputControl
All
Both
output_control
Control plugin output metadata filtering policy after inference
PluginControl ::OutputControl
output_control { … }
see details in OutputControl
All
Both
 
Gst-nvinferserver plugin, message PluginControl::InputControl definition details
Property
Meaning
Type and Range
Example
Notes
Network Types/ Applicable to GIEs (Primary/Secondary)
process_mode
Processing mode, selected from PluginControl::ProcessMode. In deepstream-app PGIE uses PROCESS_MODE_FULL_FRAME by default, SGIE use PROCESS_MODE_CLIP_OBJECTS by default
enum PluginControl::ProcessMode
process_mode: PROCESS_MODE_FULL_FRAME
All
Both
operate_on_gie_id
Unique ID of the GIE on whose metadata (bounding boxes) this GIE is to operate on
int32, >=0, valid gie-id.
-1, disable gie-id check, inference on all GIE Ids
operate_on_gie_id: 1
All
Secondary
operate_on_class_ids
Class IDs of the parent GIE on which this GIE is to operate on
Comma delimited int32 array
operate_on_class_ids: [1, 2]
Operates on objects with class IDs 1, 2 generated by parent GIE
All
Secondary
interval
Specifies the number of consecutive, batches to be skipped for inference. default is 0
uint32
interval: 1
All
Primary
async_mode
Enables inference on detected objects and asynchronous metadata attachments. Works only when tracker-ids are attached. Pushes buffer downstream without waiting for inference results. Attaches metadata after the inference
bool
async_mode: false
Classifier
Secondary
object_control
input object filter settings
PluginControl::InputObjectControl
object_control {
bbox_filter {
min_width: 64
min_height: 64
}
}
see details in
InputObjectControl
All
Secondary
 
Gst-nvinferserver plugin, message PluginControl::OutputControl definition details
Property
Meaning
Type and Range
Example
Notes
Network Types/ Applicable to GIEs (Primary/Secondary)
output_tensor_meta
Enable attaching Inference output tensor metadata, tensor buffer pointer for host only
bool
output_tensor_meta: false
All
Both
detect_control
Specifies detection output filter policy
PluginControl::OutputDetectionControl
detect_control {
default_filter {
bbox_filter {
min_width: 32
min_height: 32
}
}
}
see details in OutputDetectionControl
Detector
Both
 
Gst-nvinferserver plugin, message PluginControl::InputObjectControl definition details
Fields
Meaning
Type and Range
Example
Notes
Network Types/ Applicable to GIEs (Primary/Secondary)
bbox_filter
Bounding box filter
PluginControl::BBoxFilter
 
bbox_filter {
min_width: 32
min_height: 32
}
see details in BBoxFilter
All
Secondary
 
Gst-nvinferserver plugin, message PluginControl::BBoxFilter definition details for Input and Output controls
Fields
Meaning
Type and Range
Example
Notes
Network Types / Applicable to GIEs (Primary/Secondary)
min_width
Bounding box minimum width 
uint32
min_width: 64
All
Both
min_height
Bounding box minimum height
uint32
min_height: 64
All
Both
max_width
Bounding box maximum width, default 0, max_width is ignored 
uint32
max_width: 640
All
Both
max_height
Bounding box maximum height.
default 0, max_height is ignored 
uint32
max_height: 640
All
Both
 
Gst-nvinferserver plugin, message PluginControl::OutputDetectionControl definition details
Fields
Meaning
Type and Range
Example
Notes
Network Types / Applicable to GIEs (Primary/Secondary)
default_filter
default detection filter for output controls
PluginControl::DetectClassFilter
default_filter {
bbox_filter {
min_width: 32
min_height: 32
}
}
see details in DetectClassFilter
All
Both
specific_class_filters
specifies detection filters per class to replace default filter
map<uint32, DetectClassFilter>
specific_class_filters: [
{ key: 1, value {...} },
{ key: 2,
value {...} }
]
All
Both
 
Gst-nvinferserver plugin, message PluginControl::DetectClassFilter definition details
Fields
Meaning
Type and Range
Example
Notes
Network Types/ Applicable to GIEs (Primary/Secondary)
bbox_filter
detection bounding box filter
PluginControl::BBoxFilter
bbox_filter {
min_width: 64
min_height: 64
}
Detection
Both
roi_top_offset
Offset of the RoI from the top of the frame. Only objects within the RoI are output.
 
uint32
roi_top_offset: 128
Detection
Both
roi_bottom_offset
Offset of the RoI from the bottom of the frame. Only objects within the RoI are output.
uint32
roi_bottom_offset: 
Detection
Both
border_color
specify border color for detection bounding boxes
PluginControl::Color
border_color {
r: 1.0
g: 0.0
b: 0.0
a: 1.0
}
Detection
Both
bg_color
specify background color for detection bounding boxes
PluginControl::Color
border_color { 
r: 0.0
g: 1.0
b: 0.0
a: 0.5
}
Detection
Both
 
Gst-nvinferserver plugin, message PluginControl::Color definition details
Fields
Meaning
Type and Range
Example
Notes
Network Types/ Applicable to GIEs (Primary/Secondary)
r
Red color value
float. Range[0.0, 1.0]
r: 0.5
All
Both
g
Green color value
float. Range[0.0, 1.0]
g: 0.5
All
Both
b
Blue color value
float. Range[0.0, 1.0]
b: 0.3
All
Both
a
Alpha blending value
float. Range[0.0, 1.0]
a: 1.0
All
Both
The message InferenceConfig defines all the low-level structure fields in nvdsinferserver_config.proto. It has major settings for inference backend, network preprocessing and postprocessing.
message InferenceConfig definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
unique_id
Unique ID identifying metadata generated by this GIE
uint32, ≥0
unique_id: 1
All
Both
gpu_ids
Device IDs of GPU to use for pre-processing/inference (single GPU support only)
int32 array, ≥0
gpu_ids: [0]
All
Both
max_batch_size
Max number of frames/objects to be inferred together in a batch
uint32, ≥0
max_batch_size: 1
All
Both
backend
Inference backend settings
BackendParams
backend {
trt_is { ... }
}
see details in BackendParams
All
Both
preprocess
Network preprocessing setting for color conversion scale and normalization
PreProcessParams
preprocess {
normalize { … }
}
see details in PreProcessParams
All
Both
postprocess
Inference output tensor parsing methods such as detection, classification, segmentation and others
PostProcessParams
postprocess {
detection {...}
}
see details in PostProcessParams
All
Both
custom_lib
Specify custom lib path for custom parsing functions and preloads, optional
CustomLib
custom_lib {
path : ./libcustom_parsing.so
}
All
Both
extra
extra controls for inference config.
ExtraControl
extra {
output_buffer_pool_size: 2
}
see details in ExtraControl
All
Both
lstm
LSTM control parameters, limited on batch-size 1 and single stream
LstmParams
[optional]
lstm {
loops {
input: “init_lstm_c”
output: “output/lstm_c”
init_const { value: 0 }
}
}
See details in LstmParams
All
Both
 
message BackendParams definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
inputs
Backend input layer settings, optional
InputLayer arrays
see details in InputLayer
All
Both
outputs
Backend output layer settings, optional
OutputLayer arrays
see details in OutputLayer
All
Both
trt_is
backend of Triton Inference Server settings
TrtISParams
see details in TrtISParams
All
Both
 
message InputLayer definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
name
input tensor name
string
name: “input_0”
All
Both
dims
input tensor shape, optional. only required if backend cannot figure out fixed input shapes
int32 array,
> 0
dims: [299, 299, 3]
All
Both
data_type
enum TensorDataType with types:
TENSOR_DT_NONE,
TENSOR_DT_FP32,
TENSOR_DT_FP16,
TENSOR_DT_INT8,
TENSOR_DT_INT16,
TENSOR_DT_INT32,
TENSOR_DT_UINT8,
TENSOR_DT_UINT16,
TENSOR_DT_UINT32
Default TENSOR_DT_NONE, usually can be deduced from Triton model config.pbtxt
TensorDataType
data_type: TENSOR_DT_FP32
All
Both
 
message OutputLayer definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
name
output tensor name
string
name: “detection_boxes”
All
Both
 
message TrtISParams definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
model_name
Triton inference model name
string
model_name: “ssd_inception_graphdef”
All
Both
version
Triton model version number.
-1, latest version number.
>0, reserved for specific version number in future version
int64
version: -1
All
Both
model_repo
Triton model repository settings.
Note, all model_repo settings must be same in single process
TrtISParams::ModelRepo
model_repo {
root: “../trtis_model_repo”
log_level: 2
}
see more details in ModelRepo
All
Both
 
message TrtISParams::ModelRepo definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
root
Triton inference model repository directory path 
string
root: “../trtis_model_repo”
All
Both
log_level
Triton log output levels
 
uint32;
0, ERROR;
1, WARNING;
2, INFO;
>=3, VERBOSE Level
log_level: 1
All
Both
strict_model_config
Enable Triton strict model configuration, see details in Triton Generated Model Configuration. Suggest setting value true
bool
strict_model_config: true
All
Both
tf_gpu_memory_fraction
TensorFlow GPU memory fraction per process. Valid for Tensorflow models only.
Default 0 means no GPU memory limitation. Suggest tuning to a proper value (e.g. in range of [0.2, 0.6]) in case Tensorflow uses up whole GPU memory
float,
Range (0, 1.0]
tf_gpu_memory_fraction: 0.6
All
Both
tf_disable_soft_placement
Disable TensorFlow soft placement of operators. It’s enabled by default.
bool
tf_disable_soft_placement: false
All
Both
 
message PreProcessParams definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
network_format
enum MediaFormat with formats:
MEDIA_FORMAT_NONE
IMAGE_FORMAT_RGB
IMAGE_FORMAT_BGR
IMAGE_FORMAT_GRAY
. use IMAGE_FORMAT_RGB by default.
MediaFormat
network_format: IMAGE_FORMAT_RGB
All
Both
tensor_order
enum TensorOrder with order types:
TENSOR_ORDER_NONE,
TENSOR_ORDER_LINEAR,
TENSOR_ORDER_NHWC.
It can deduce the value from backend layers info if set to TENSOR_ORDER_NONE
TensorOrder
tensor_order: TENSOR_ORDER_NONE
All
Both
tensor_name
Specify the tensor name for the preprocessing buffer.
This is in the case when multiple input tensors in a single network.
string;
Optional
tensor_name: “input_0”
All
Both
frame_scaling_hw
Compute hardware to use for scaling frames / object crops to network resolution
enum FrameScalingHW
FRAME_SCALING_HW_DEFAULT: Platform default – GPU (dGPU), VIC (Jetson)
FRAME_SCALING_HW_GPU
FRAME_SCALING_HW_VIC (Jetson only)
frame_scaling_hw: FRAME_SCALING_HW_GPU
All
Both
frame_scaling_filter
The filter to use for scaling frames / object crops to network resolution
int32, refer to enum NvBufSurfTransform_Inter in nvbufsurftransform.h for valid values
frame_scaling_filter: 1
All
Both
maintain_aspect_ratio
Indicates whether to maintain aspect ratio while scaling input.
int32;
0 or 1
maintain_aspect_ratio: 0
All
Both
normalize
Network input tensor normalization settings for scale-factors, offsets and mean-subtraction
PreProcessParams::ScaleNormalize
normalize {
scale_factor: 1.0
channel_offsets: [0, 0, 0]
}
see details in PreProcessParams::ScaleNormalize
 
 
message PreProcessParams::ScaleNormalize definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
scale_factor
Pixel normalization factor
float
scale_factor: 0.0078
All
Both
channel_offsets
Array of mean values of color components to be subtracted from each pixel. Array length must equal the number of color components in the frame. The plugin multiplies mean values by scale_factor
float array,
Optional
channel_offsets: [77.5, 21.2, 11.8]
All
Both
mean_file
Pathname of mean data file (PPM format)
string;
Optional
mean_file: “./model_meanfile.ppm”
All
Both
 
message PostProcessParams definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
labelfile_path
Pathname of a text file containing the labels for the model
string
labelfile_path: “=/home/ubuntu/model_labels.txt”
All
Both
oneof process_type
Indicates one of the postprocessing type
detection;
classification;
segmentation;
other;
None
N/A
All
Both
detection
Specify detection parameters for the network.
It must be oneof process_type
DetectionParams
detection {
num_detected_classes: 4
simple_cluster {
threshold: 0.2
}
}
see details in DetectionParams
Detector
Both
classification
Specify classification parameters for the network
It is oneof process_type
ClassificationParams
classification {
threshold: 0.6
}
see details in ClassificationParams
Classifier
Both
segmentation
Specify segmentation parameters for the network
It is oneof process_type
SegmentationParams
segmentation {
threshold: 0.2
}
Segmentation
Both
other
Specify other network parameters.
This is for user-defined networks and usually coexists with output_control.output_tensor_meta: true. Tensor output data would be attached into GstBuffer. Data can be parsed in application. User can increase extra.output_buffer_pool_size if need to hold metadata longer.
It is oneof process_type
OtherNetworkParams
other {}
see details in OtherNetworkParams
Others
Both
trtis_classification
Specify Triton classification parameters for the network
It is oneof process_type
TrtIsClassifyParams
trtis_classification {
topk: 1
}
see details in TrtIsClassifyParams
Classifier
Both
 
message DetectionParams definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
num_detected_classes
Define number of classes detected by the network
int32,
> 0
num_detected_classes:4
Detector
Both
per_class_params
Map of specific detection parameters per class. Key-value follows <class_id: per_class_params> order. 
map<int32, PerClassParams>;
Optional
per_class_params {
{ key: 1,
value { pre_threshold : 0.4}
},
{ key: 2,
value { pre_threshold : 0.5}
}
}
see details for PerClassParams
Detector
Both
custom_parse_bbox_func
Name of the custom bounding box parsing function. If not specified, Gst-nvinferserver uses the internal function for the resnet model provided by the SDK.
If specified, also need to set custom_lib to load custom library.
string;
custom_parse_bbox_func: "NvDsInferParseCustomTfSSD"
Detector
Both
oneof clustering_policy
Indicates one of the clustering policies from
nms;
dbscan;
group_rectangle;
simple_cluster;
None
N/A
Detector
Both
nms
Indicates clustering bounding boxes by Non-Maximum-Suppression method detected objects.
It is oneof clustering_policy
Nms
nms {
confidence_threshold: 0.3
iou_threshold: 0.4
}
see details in Nms
Detector
Both
dbscan
Indicates clustering bounding boxes by DBSCAN method for detected objects.
It is oneof clustering_policy
DbScan
dbscan {
pre_threshold: 0.3
eps: 0.7
min_boxes: 3
}
see details in DbScan
Detector
Both
group_rectangle
Indicates clustering bounding boxes by groupRectangles() function for grouping detected objects
It is oneof clustering_policy
GroupRectangle
group_rectangle {
confidence_threshold: 0.2
group_threshold: 2
eps: 0.2
}
Detector
Both
simple_cluster
Indicates simple clustering method by outlier boxes through threshold
SimpleCluster
simple_cluster {
threshold: 0.2
}
Detector
Both
 
message DetectionParams::PerClassParams definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
pre_threshold
Define confidence threshold per class
float
pre_threshold:0.3
Detector
Both
 
message DetectionParams::Nms definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
confidence_threshold
Detection score lesser than this threshold would be rejected
float
confidence_threshold:0.5
Detector
Both
iou_threshold
Maximum IOU score between two proposals after which the proposal with the lower confidence will be rejected.
float
iou_threshold: 0.3
Detector
Both
topk
Specify top k detection results to keep after nms
int32, >= 0
topk: 2;
value 0, means keep all.
Detector
Both
 
message DetectionParams::DbScan definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
pre_threshold
Detection score lesser than this threshold would be rejected before DBSCAN clustering
float
pre_threshold:0.2
Detector
Both
eps
DBSCAN epsilon to control merging of overlapping boxes.
float
eps: 0.7
Detector
Both
min_boxes
Minimum boxes in DBSCAN cluster to be considered an object
int32, > 0
min_boxes: 3;
Detector
Both
min_score
Minimum score in DBSCAN cluster for it to be considered as an object
float
min_score: 0.7
Default value is 0
Detector
Both
 
message DetectionParams::GroupRectangle definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
confidence_threshold
Detection score lesser than this threshold would be rejected
float
confidence_threshold:0.2
Detector
Both
group_threshold
Threshold value for rectangle merging for OpenCV grouprectangles() function
int32; >= 0
group_threshold: 1
Detector
Both
eps
Epsilon to control merging of overlapping boxes
float
eps: 0.2
Detector
Both
 
message DetectionParams::SimpleCluster definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
threshold
Detection score lesser than this threshold would be rejected
float
confidence_threshold:0.6
Detector
Both
message ClassificationParams definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
threshold
Classification score lesser than this threshold would be rejected
float
threshold: 0.5
Classifier
Both
custom_parse_classifier_func
Name of the custom classifier output parsing function.
If not specified, Gst-nvinfer uses the internal parsing function with NCHW tensor order for softmax layers. User can reshape other output tensor order to NCHW in TRTIS config.pbtxt to run internal parsing.
If specified, also need to set custom_lib to load custom library.
string
parse-classifier-func-name: “parse_bbox_softmax”
Classifier
Both
 
message SegmentationParams definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
threshold
Segmentation score lesser than this threshold would be rejected
float
threshold: 0.5
Segmentation
Both
 
message OtherNetworkParams definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
type_name
Specify a user-defined network name
string;
Optional
type_name: “face”
Others
Both
 
message TrtIsClassifyParams definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier/ Applicable to GIEs (Primary/Secondary)
topk
Specify top k elements need to keep from Triton’s native classification
uint32; >=0
topk : 1
Value 0 or empty would keep the top 1 result.
Classifier
Both
threshold
Classification score lesser than this threshold would be rejected
float
threshold: 0.5
Classifier
Both
 
message CustomLib definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
path
Pathname that points to a custom library for preload
string
path: “/home/ubuntu/lib_custom_impl.so”
All
Both
 
message ExtraControl definition details
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
copy_input_to_host_buffers
Enable to copy input tensor data to host buffers.
If enabled, input tensor would be attached as NvDsInferTensorMeta into GstBuffer with output tensors together
bool
copy_input_to_host_buffers: false
All
Both
output_buffer_pool_size
Specify the buffer pool size for each output tensor.
When infer_config.postprocess.other is specified or output_control.output_tensor_meta is enabled, the output tensor would be attached as NvDsInferTensorMeta into GstBuffer
int32;
Range [2, 10]
output_buffer_pool_size: 4
All
Both
 
message LstmParams definition details
Note: LstmParams structures might be changed in future versions
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
loops
Specify LSTM loops between input and output tensors.
LstmLoop
[repeated]
loops [ {
input: “init_state”
output: “out_state”
} ]
See details in LstmParams::LstmLoop
All
Both
 
message LstmParams::LstmLoop definition details
Note: input and output tensors must have same datatype/dimensions, FP16 is not supported
LstmParams::LstmLoop structures might be changed in future versions
Name
Description
Type and Range
Example
Notes
Detector or Classifier / Applicable to GIEs (Primary/Secondary)
input
Specify input tensor name of the current loop.
string
Input: “init_state”
All
Both
output
Specify input tensor name of the current loop. Tensor data will feedback to the input tensor
string
onput: “output_state”
All
Both
init_const
Specify the constant values for the input in first frame
InitConst
value: float
Init_const { value: 0 }
All
Both

Gst Properties

The values set through Gst properties override the values of properties in the configuration file. The application does this for certain properties that it needs to set programmatically. If user set property though plugin, these values would replace the original value in config files.
The following table describes the Gst-nvinferserver plugin’s Gst properties.
Gst-nvinferserver plugin, Gst properties
Property
Meaning
Type and Range
Example
Notes
config-file-path
Absolute pathname of configuration file for the Gst-nvinferserver element
String
config-file-path=config_infer_primary.txt
process-mode
Infer Processing Mode
(0):None, (1)FullFrame, (2)ClipObject.
If set, it could replace input_control.process_mode
 
Integer, 0, 1 or 2
process-mode=1
unique-id
Unique ID identifying metadata generated by this GIE.
If set, it could replace infer_config.unique_id
Integer,
0 to 4,294,967,295
unique-id=1
infer-on-gie-id
See input_control.operate_on_gie_id in the configuration file table
Integer,
0 to 4,294,967,295
infer-on-gie-id=1
operate-on-class-ids
See input_control.operate_on_class_ids in the configuration file table
An array of colon- separated integers (class-ids)
operate-on-class-ids=1:2:4
batch-size
Number of frames/objects to be inferred together in a batch.
If set, it could replace infer_config.max_batch_size
Integer,
1 – 4,294,967,295
batch-size=4
Interval
Number of consecutive batches to be skipped for inference
If set, it could replace input_control.interval
Integer, 0 to 32
interval=0
raw-output-generated-callback
Pointer to the raw output generated callback function
Pointer
Cannot be set through gst-launch
raw-output-generated-userdata
Pointer to user data to be supplied with raw-output-generated-callback
Pointer
Cannot be set through gst-launch

Triton Ensemble Models

The Gst-nvinferserver plugin can support Triton ensemble models for further custom preprocessing, backend and postprocessing through Triton custom-backends.
Triton ensemble model represents a pipeline of one or more models and the connection of input and output tensors between those models, such as “data preprocessing -> inference -> data postprocessing”. See more details
To manage memory efficiency and keep clean interface, The Gst-nvinferserver Plugin’s default preprocessing cannot be disabled. Color conversion, datatype conversion, input scaling and object cropping are continue working in nvds_infer_server natively. For example, in the case native normalization is not needed, update scale_factor to 1.0
infer_config { preprocess {
network_format: IMAGE_FORMAT_RGB
tensor_order: TENSOR_ORDER_LINEAR
normalize { scale_factor: 1.0 } } }
The low level nvds_infer_server library could deliver specified media-format (RGB/BGR/Gray) in any kind of tensor orders and datatypes as a Cuda GPU buffer input to Triton backend. User’s custom-backend must support GPU memory on this input. Triton custom-backend sample identity can work with Gst-nvinferserver plugin.
Note:
Custom backend API must have same Triton codebase version (v1.12.0)
For Jetson, download v1.12.0-jetpack4.4dp.tgz
To learn details how to implement Triton custom-backend, please refer to
For Triton model’s output, TRTSERVER_MEMORY_GPU and TRTSERVER_MEMORY_CPU buffer allocation are supported in nvds_infer_server according to Triton output request. This also works for ensemble model’s final output tensors.
Finally, inference data can be parsed by default detection, classification, or segmentation. Alternatively, user can implement custom-backend for postprocessing, then deliver the final output to Gst-nvinferserver plugin to do further processing. Besides that, User can also optionally attach raw tensor output data into metadata for downstream or application to parse.

Tensor Metadata

The Gst-nvinferserver plugin can attach raw output tensor data generated by the inference backend as metadata. It is added as an NvDsInferTensorMeta in the frame_user_meta_list member of NvDsFrameMeta for primary (full frame) mode, or in the obj_user_meta_list member of NvDsObjectMeta for secondary (object) mode. It uses same metadata structure with Gst-nvinferserver plugin.
Note:
Gst-nvinferserver plugin does not attach device buffer pointer NvDsInferTensorMeta::attach out_buf_ptrs_dev at this moment.
To read or parse inference raw tensor data of output layers
1. Enable
output_control { output_tensor_meta : true }
If native postprocessing need be disabled, update:
infer_config { postprocess { other {} } }
fields in the configuration file for the Gst-nvinferserver plugin.
2. When operating as primary GIE, NvDsInferTensorMeta is attached to each frame’s (each NvDsFrameMeta object’s) frame_user_meta_list. When operating as secondary GIE, NvDsInferTensorMeta is attached to each NvDsObjectMeta object’s obj_user_meta_list.
Metadata attached by Gst-nvinferserver can be accessed in a GStreamer pad probe attached downstream from the Gst-nvinferserver instance.
3. The NvDsInferTensorMeta object’s metadata type is set to NVDSINFER_TENSOR_OUTPUT_META. To get this metadata you must iterate over the NvDsUserMeta user metadata objects in the list referenced by frame_user_meta_list or obj_user_meta_list.
For more information about Gst-infer tensor metadata usage, see the source code in sources/apps/sample_apps/deepstream_infer_tensor_meta-test.cpp, provided in the DeepStream SDK samples.

Segmentation Metadata

The Gst-nvinferserver plugin attaches the output of the segmentation model as user meta in an instance of NvDsInferSegmentationMeta with meta_type set to NVDSINFER_SEGMENTATION_META. The user meta is added to the frame_user_meta_list member of NvDsFrameMeta for primary (full frame) mode, or the obj_user_meta_list member of NvDsObjectMeta for secondary (object) mode.
For guidance on how to access user metadata, see User/Custom Metadata Addition Inside NvDsMatchMeta and Tensor Metadata, above.

Gst-nvtracker

This plugin tracks detected objects and gives each new object a unique ID.
The plugin adapts a low-level tracker library to the pipeline. It supports any low-level library that implements the low-level API, including the three reference implementations, the NvDCF, KLT, and IOU trackers.
As part of this API, the plugin queries the low-level library for capabilities and requirements concerning input format and memory type. It then converts input buffers into the format requested by the low-level library. For example, the KLT tracker uses Luma-only format; NvDCF uses NV12 or RGBA; and IOU requires no buffer at all.
The low-level capabilities also include support for batch processing across multiple input streams. Batch processing is typically more efficient than processing each stream independently. If a low-level library supports batch processing, that is the preferred mode of operation. However, this preference can be overridden with the enable-batch-process configuration option if the low-level library supports both batch and per-stream modes.
The low-level capabilities also include support for passing the past-frame data, which includes the object tracking data generated in the past frames but not reported as output yet. This can be the case when the low-level tracker stores the object tracking data generated in the past frames internally because of, say, low confidence, but later decided to report due to, say, increased confidence. These past-frame data are reported as a user-meta. This can be enabled by the enable-past-frame configuration option.
The plugin accepts NV12/RGBA data from the upstream component and scales (converts) the input buffer to a buffer in the format required by the low-level library, with tracker width and height. (Tracker width and height must be specified in the configuration file’s [tracker] section.)
The low -level tracker library is selected via the ll-lib-file configuration option in the tracker configuration section. The selected low-level library may also require its own configuration file, which can be specified via the ll-config-file option.
The three reference low level libraries support different algorithms:
The KLT tracker uses a CPU-based implementation of the Kanade Lucas Tomasi (KLT) tracker algorithm. This library requires no configuration file.
The Intersection of Union (IOU) tracker uses the intersection of the detector’s bounding boxes across frames to determine the object’s unique ID. This library takes an optional configuration file.
The Nv-adapted Discriminative Correlation Filter (NvDCF) tracker uses a correlation filter-based online discriminative learning algorithm, coupled with a Hungarian algorithm for data association in multi-object tracking. This library accepts an optional configuration file.
`A screenshot of a cell phone Description automatically generated

Inputs and Outputs

This section summarizes the inputs, outputs, and communication facilities of the Gst-nvtracker plugin.
Inputs
Gst Buffer (batched)
NvDsBatchMeta
Formats supported are NV12 and RGBA.
Control parameters
tracker-width
tracker-height
gpu-id (for dGPU only)
ll-lib-file
ll-config-file
enable-batch-process
enable-past-frame
tracking-surface-type
display-tracking-id
Output
Gst Buffer (provided as an input)
NvDsBatchMeta (with addition of tracked object coordinates, tracker confidence and object IDs in NvDsObjectMeta)
Note:
If the tracker algorithm does not generate confidence value, then tracker confidence value will be set to -0.1 for tracked objects.
For KLT and IOU tracker tracker_confidence is set to -0.1 as these algorithms do not generate confidence values for tracked objects. Nvdcf tracker generates confidence for the tracked objects and its value is set in tracker_confidence field in NvDsObjectMeta structure

Features

The following table summarizes the features of the plugin.
Features of the Gst-nvtracker plugin
Feature
Description
Release
Configurable tracker width/height
Frames are internally scaled to specified resolution for tracking
DS 2.0
Multi-stream CPU/GPU tracker
Supports tracking on batched buffers consisting of frames from different sources
DS 2.0
NV12 Input
DS 2.0
RGBA Input
DS 3.0
Allows low FPS tracking
IOU tracker
DS 3.0
Configurable GPU device
User can select GPU for internal scaling/color format conversions and tracking
DS2.0
Dynamic addition/deletion of sources at runtime
Supports tracking on new sources added at runtime and cleanup of resources when sources are removed
DS 3.0
Support for user’s choice of low-level library
Dynamically loads user selected low-level library
DS 4.0
Support for batch processing
Supports sending frames from multiple input streams to the low-level library as a batch if the low-level library advertises capability to handle that
DS 4.0
Support for multiple buffer formats as input to low-level library
Converts input buffer to formats requested by the low-level library, for up to 4 formats per frame
DS 4.0
Support for reporting past-frame data
Supports reporting past-frame data if the low-level library supports the capability
DS 5.0
Support for enabling tracking-id display
Supports enabling or disabling display of tracking-id
DS 5.0

Gst Properties

The following table describes the Gst properties of the Gst-nvtracker plugin.
Gst-nvtracker plugin, Gst Properties
Property
Meaning
Type and Range
Example
Notes
tracker-width
Frame width at which the tracker is to operate, in pixels.
Integer,
0 to 4,294,967,295
tracker-width=640
tracker-height
Frame height at which the tracker is to operate, in pixels.
Integer,
0 to 4,294,967,295
tracker-height=368
ll-lib-file
Pathname of the low-level tracker library to be loaded by Gst-nvtracker.
String
ll-lib-file=/opt/nvidia/deepstream/libnvds_nvdcf.so
ll-config-file
Configuration file for the low-level library if needed.
Path to configuration file
ll-config-file=/opt/nvidia/deepstream/tracker_config.yml
gpu-id
ID of the GPU on which device/unified memory is to be allocated, and with which buffer copy/scaling is to be done. (dGPU only.)
Integer,
0 to 4,294,967,295
gpu-id=1
enable-batch-process
Enables/disables batch processing mode. Only effective if the low-level library supports both batch and per-stream processing. (Optional.)
Boolean
enable-batch-process=1
enable-past-frame
Enables/disables reporting past-frame data mode. Only effective if the low-level library supports it.
Boolean
enable-past-frame=1
tracking-surface-type
Set surface stream type for tracking. (default value is 0)
Integer, ≥0
tracking-surface-type=0
display-tracking-id
Enables tracking id display on OSD.
Boolean
display-tracking-id=1
iou-threshold
Intersection over union threshold for considering two bounding boxes for association
float, [0..1]
tracker-width=0.6

Custom Low-Level Library

To write a custom low-level tracker library, implement the API defined in sources/includes/nvdstracker.h. Parts of the API refer to sources/includes/nvbufsurface.h.
The names of API functions and data structures are prefixed with NvMOT, which stands for NVIDIA Multi-Object Tracker.
This is the general flow of the API from a low-level library perspective:
1. The first required function is:
NvMOTStatus NvMOT_Query(
uint16_t customConfigFilePathSize,
char* pCustomConfigFilePath,
NvMOTQuery *pQuery
);
The plugin uses this function to query the low-level library’s capabilities and requirements before it starts any processing sessions (contexts) with the library. Queried properties include the input frame memory format (e.g., RGBA or NV12), memory type (e.g., NVIDIA® CUDA® device or CPU mapped NVMM), and support for batch processing.
The plugin performs this query once, and its results apply to all contexts established with the low-level library. If a low-level library configuration file is specified, it is provided in the query for the library to consult.
The query reply structure NvMOTQuery contains the following fields:
NvMOTCompute computeConfig: Reports compute targets supported by the library. The plugin currently only echoes the reported value when initiating a context.
uint8_t numTransforms: The number of color formats required by the low-level library. The valid range for this field is 0 to NVMOT_MAX_TRANSFORMS. Set this to 0 if the library does not require any visual data. Note that 0 does not mean that untransformed data will be passed to the library.
NvBufSurfaceColorFormat colorFormats[NVMOT_MAX_TRANSFORMS]: The list of color formats required by the low-level library. Only the first numTransforms entries are valid.
NvBufSurfaceMemType memType: Memory type for the transform buffers. The plugin allocates buffers of this type to store color and scale-converted frames, and the buffers are passed to the low-level library for each frame. Note that support is currently limited to the following types:
dGPU: NVBUF_MEM_CUDA_PINNED
NVBUF_MEM_CUDA_UNIFIED
Jetson: NVBUF_MEM_SURFACE_ARRAY
bool supportBatchProcessing: True if the low-library support batch processing across multiple streams; otherwise false.
bool supportPastFrame: True if the low-library support outputting past frame; otherwise false.
2. After the query, and before any frames arrive, the plugin must initialize a context with the low-level library by calling:
NvMOTStatus NvMOT_Init(
NvMOTConfig *pConfigIn,
NvMOTContextHandle *pContextHandle,
NvMOTConfigResponse *pConfigResponse
);
The context handle is opaque outside the low-level library. In batch processing mode, the plugin requests a single context for all input streams. In per-stream processing mode, the plugin makes this call for each input stream so that each stream has its own context.
This call includes a configuration request for the context. The low-level library has an opportunity to:
Review the configuration and create a context only if the request is accepted. If any part of the configuration request is rejected, no context is created, and the return status must be set to NvMOTStatus_Error. The pConfigResponse field can optionally contain status for specific configuration items.
Pre-allocate resources based on the configuration.
Note:
In the NvMOTMiscConfig structure, the logMsg field is currently unsupported and uninitialized.
The customConfigFilePath pointer is only valid during the call.
3. Once a context is initialized, the plugin sends frame data along with detected object bounding boxes to the low-level library each time it receives such data from upstream. It always presents the data as a batch of frames, although the batch can contain only a single frame in per-stream processing contexts. Note that depending on the frame arrival timings to the tracker plugin, the composition of frame batches could either be full batch (that contains a frame from all streams) or partial batch (that contains a frame from only a subset of the streams). In either case, each batch is guaranteed to contain at most one frame from each stream.
The function call for this processing is:
NvMOTStatus NvMOT_Process(NvMOTContextHandle contextHandle,
NvMOTProcessParams *pParams,
NvMOTTrackedObjBatch *pTrackedObjectsBatch
);
Where:
pParams is a pointer to the input batch of frames to process. The structure contains a list of one or more frames, with at most one frame from each stream. No two frame entries have the same streamID. Each entry of frame data contains a list of one or more buffers in the color formats required by the low-level library, as well as a list of object descriptors for the frame. Most libraries require at most one-color format.
pTrackedObjectsBatch is a pointer to the output batch of object descriptors. It is pre-populated with a value for numFilled, the number of frames included in the input parameters.
If a frame has no output object descriptors, it is still counted in numFilled and is represented with an empty list entry (NvMOTTrackedObjList). An empty list entry has the correct streamID set and numFilled set to 0.
Note:
The output object descriptor NvMOTTrackedObj contains a pointer to the associated input object, associatedObjectIn. You must set this to the associated input object only for the frame where the input object is passed in. For example:
Frame 0: NvMOTObjToTrack X is passed in. The tracker assigns it ID 1, and the output object associatedObjectIn points to X.
Frame 1: Inference is skipped, so there is no input object. The tracker finds object 1, and the output object associatedObjectIn points to NULL.
Frame 2: NvMOTObjToTrack Y is passed in. The tracker identifies it as object 1. The output object 1 has associatedObjectIn pointing to Y.
4. Depending on the capability of the low-level tracker, there could be some tracked object data generated in the past frames but stored internally without being reported due to, say, low confidence in the past frames. If it becomes more confident in the later frames and ready to report them, then those past-frame data can be retrieved from the tracker plug-in using the following function call. Past frame data is outputted to batch_user_meta_list in NvDsBatchMeta:
NvMOTStatus NvMOT_ProcessPast(NvMOTContextHandle contextHandle,
NvMOTProcessParams *pParams,
NvDsPastFrameObjBatch *pPastFrameObjBatch
);
Where:
pParams is a pointer to the input batch of frames to process. This structure is needed to check the list of streamID in the batch.
pPastFrameObjBatch is a pointer to the output batch of object descriptors generated in the past frames. The data structure NvDsPastFrameObjBatch is defined in include/nvds_tracker_meta.h. It may include a set of tracking data for each stream in the input. For each object, there could be multiple past-frame data in case the tracking data is stored for multiple frames for the object.
5. In case that a video stream source is removed on the fly, the plugin calls the following function so that the low-level tracker lib can remove it as well. Note that this API is optional and valid only when the batch processing mode is enabled, meaning that it will be executed only when the low-level tracker lib has an actual implementation. If called, the low-level tracker lib can release any per-stream resource that it may be allocated:
void NvMOT_RemoveStreams(NvMOTContextHandle contextHandle, NvMOTStreamId streamIdMask);
6. When all processing is complete, the plugin calls this function to clean up the context:
void NvMOT_DeInit(NvMOTContextHandle contextHandle);

Low-Level Tracker Library Comparisons and Tradeoffs

DeepStream 5.0 provides three low-level tracker libraries which have different resource requirements and performance characteristics, in terms of accuracy, robustness, and efficiency, allowing you to choose the best tracker based on you use case. See the following table for comparison.
Tracker library comparison
Tracker
Computational Load
Pros
Cons
Best Use Cases
GPU
CPU
IOU
X
Very Low
Light weight
No visual features for matching, so prone to frequent tracker ID switches and failures.
Not suitable for fast moving scene.
Objects are sparsely located, with distinct sizes.
Detector is expected to run every frame or very frequently (ex. every alternate frame).
KLT
X
High
Works reasonably well for simple scenes
High CPU utilization.
Susceptible to change in the visual appearance due to noise and perturbations, such as shadow, non-rigid deformation, out-of-plane rotation, and partial occlusion.
Cannot work on objects with low textures.
Objects with strong textures and simpler background.
Ideal for high CPU resource availability.
NvDCF
Medium
Low
Highly robust against partial occlusions, shadow, and other transient visual changes.
 
Less frequent ID switches.
Slower than KLT and IOU due to increased computational complexity.
Reduces the total number of streams processed.
Multi-object, complex scenes with partial occlusion.

NvDCF Low-Level Tracker

NvDCF is a reference implementation of the custom low-level tracker library that supports multi-stream, multi-object tracking in a batch mode using a discriminative correlation filter (DCF) based approach for visual object tracking and a data association algorithm (such as Hungarian algorithm) based on visual and contextual data.
NvDCF allocates memory during initialization based on:
The number of streams to be processed
The maximum number of objects to be tracked per stream (denoted as maxTargetsPerStream in a configuration file for the NvDCF low-level library, tracker_config.yml)
Once the number of objects being tracked reaches the configured maximum value, any new objects will be discarded until resources for some existing tracked objects are released. Note that the number of objects being tracked includes objects that are tracked in Shadow Mode (described below). Therefore, NVIDIA recommends that you make maxTargetsPerStream large enough to accommodate the maximum number of objects of interest that may appear in a frame, as well as the objects that may have been tracked from past frames in shadow mode. To allow NvDCF to store and report such objects tracked in shadow-mode from past frames (i.e., past-frame data), user would need to set useBufferedOutput: 1 in low-level config (e.g., tracker_config.yml) and enable-past-frame=1, enable-batch-process=1 under [tracker] in the deepstream-app config file, since past frame is only supported in the batch processing mode in this release. Also, note that GPU memory usage by NvDCF is linearly proportional to the total number of objects being tracked, which is (number of video streams) × (maxTargetsPerStream).
DCF-based trackers typically apply an exponential moving average for temporal consistency when the optimal correlation filter is created and updated over consecutive frames. The learning rate for this moving average can be configured as filterLr. The standard deviation for Gaussian for desired response when creating an optimal DCF filter can also be configured as gaussianSigma.
DCF-based trackers also define a search region around the detected target location large enough for the same target to be detected in the search region in the next frame. The SearchRegionPaddingScale property determines the size of the search region as a multiple of the target’s bounding box size. For example, with SearchRegionPaddingScale:3, the size of the search region would be:
Where w and h are the width and height of the target’s bounding box.
Once the search region is defined for each target, the image patches for the search regions are cropped and scaled to a predefined feature image size, then the visual features are extracted. The featureImgSizeLevel property defines the size of the feature image. A lower value of featureImgSizeLevel causes NvDCF to use a smaller feature size, increasing GPU performance at the cost of accuracy and robustness.
Consider the relationship between featureImgSizeLevel and SearchRegionPaddingScale when configuring the parameters. If SearchRegionPaddingScale is increased while featureImgSizeLevel is fixed, the number of pixels corresponding to the target in the feature images is effectively decreased.
The minDetectorConfidence property sets confidence level below which object detection results are filtered out.
To achieve robust tracking, NvDCF employs two approaches to handling false alarms from PGIE detectors: late activation for handling false positives and shadow tracking for false negatives. Whenever a new object is detected a new tracker is instantiated in temporary mode. It must be activated to be considered as a valid target. Before it is activated it undergoes a probationary period, defined by probationAge, in temporary mode. If the object is not detected in more than earlyTerminationAge consecutive frames during the period, the tracker is terminated early.
Once the tracker for an object is activated, it is put into inactive mode only when (1) no matching detector input is found during the data association, or (2) the tracker confidence falls below a threshold defined by minTrackerConfidence. The per-object tracker will be put into active mode again if a matching detector input is found. The length of period during which a per-object tracker is in inactive mode is called the shadow tracking age; if it reaches the threshold defined by maxShadowTrackingAge, the tracker is terminated. Even if an object is being tracked in inactive mode, if the tracker confidence value is higher than minTrackingConfidenceDuringInactive, then the tracker will put the output into the metadata. Note if the value of minTrackingConfidenceDuringInactive is set too low, then some lingering bounding boxes may be observed occasionally after the objects disappear from the scene. If the bounding box of an object being tracked goes partially out of the image frame and so its visibility falls below a predefined threshold defined by minVisibiilty4Tracking, the tracker is also terminated.
NvDCF can generate a unique ID to some extent. If enabled by setting `useUniqueID: 1`, NvDCF would generate a 32-bit random number at the initialization stage and use it as the upper 32-bit of the following target ID generation, which is uint64_t type. The randomly generated upper 32-bit number allows the target ID to increment from a random position in the possible ID space. The initial value of the lower 32-bit of the target ID starts from 0. If disabled (which is the default value), the target ID generation would be incremented from 0.
NvDCF employs two types of state estimators: MovingAvgEstimator (MAE) and Kalman Filter (KF). Both KF and MAE have 7 states defined, which include {x, y, a, h, dx, dy, dh}, where x and y indicate the coordinates of the top-left corner of a target bbox, while a and h do the aspect ratio and the height of the bbox, respectively. dx, dy, and dh are the velocity of the three parameters. KF employs a constant velocity model for generic use. The measurement vector is defined as {x, y, a, h}. The process noise variance for {x, y}, {a, h}, and {dx, dy, dh} can be configured by `kfProcessNoiseVar4Loc`, `kfProcessNoiseVar4Scale`, and `kfProcessNoiseVar4Vel`, respectively. Note that from the state estimator’s point of view, there could be two different measurements: the bbox from the detector (i.e., PGIE) and the bbox from the tracker. This is because NvDCF is capable of localizing targets using its learned filter. The measurement noise variance for these two different types of measurements can be configured by `kfMeasurementNoiseVar4Det` and `kfMeasurementNoiseVar4Trk`. MAE is much simpler than KF, so more efficient in processing. The learning rate for the moving average of the defined states can be configured by `trackExponentialSmoothingLr_loc`, `trackExponentialSmoothingLr_scale`, and `trackExponentialSmoothingLr_velocity`.
To enhance the robustness, NvDCF allows to add the cross-correlation score to nearby objects as an additional regularization term, which is referred to as instance-awareness. If enabled by `useInstanceAwareness`, the number of nearby instances and the regularization weight for each instance would be determined by the config params `maxInstanceNum_ia` and `lambda_ia`, respectively.
The following table summarizes the configuration parameters for an NvDCF low-level tracker.
NvDCF low-level tracker, configuration properties
Property
Meaning
Type and Range
Example
Notes
useUniqueID
Enable unique ID generation scheme
Boolean
useUniqueID: 1
maxTargetsPerStream
Max number of targets to track per stream
Integer,
0 to 65535
maxTargetsPerStream: 30
useColorNames
Use ColorNames feature
Boolean
useColorNames: 1
useHog
Use Histogram-of-Oriented-Gradient (HOG) feature
Boolean
useHog: 1
useHighPrecisionFeature
Use high-precision numerical computation in feature extraction
Boolean
useHighPrecisionFeature: 1
filterLr
Learning rate for DCF filter in exponential moving average
Float,
0.0 to 1.0
filterLr: 0.11
filterChannelWeightsLr
Learning rate for weights for different feature channels in DCF
Float,
0.0 to 1.0
filterChannelWeightsLr: 0.22
gaussianSigma
Standard deviation for Gaussian for desired response
Float,
>0.0
gaussianSigma: 0.75
featureImgSizeLevel
Size of a feature image
Integer,
1 to 5
featureImgSizeLevel: 1
SearchRegionPaddingScale
Search region size
Integer,
1 to 3
SearchRegionPaddingScale: 3
minDetectorConfidence
Minimum detector confidence for a valid object
Float,
-inf to inf
minDetectorConfidence: 0.0
minTrackerConfidence
Minimum detector confidence for a valid target
Float,
0.0 to 1.0
minTrackerConfidence: 0.6
minTargetBboxSize
Minimum bbox size for a valid target [pixel]
Int, ≥0
minTargetBboxSize: 10
minDetectorBboxVisibilityTobeTracked
Minimum detector bbox visibility for a valid candidate
Float,
0.0 to 1.0
minDetectorBboxVisibilityTobeTracked: 0
minVisibiilty4Tracking
Minimum visibility of target bounding box to be considered valid
Float,
0.0 to 1.0
minVisibiilty4Tracking: 0.1
targetDuplicateRunInterval
Interval in which duplicate target removal is carried out [frame]
Int, -inf to inf
targetDuplicateRunInterval: 5
minIou4TargetDuplicate
Min IOU for two bboxes to be considered as duplicates
Float,
0.0 to 1.0
minIou4TargetDuplicate: 0.9
useGlobalMatching
Enable Hungarian method for data association
Boolean
useGlobalMatching: 1
maxShadowTrackingAge
Maximum length of shadow tracking
Integer, ≥0
maxShadowTrackingAge: 9
probationAge
Length of probationary period
Integer, ≥0
probationAge: 12
earlyTerminationAge
Early termination age
Integer, ≥0
earlyTerminationAge: 2
minMatchingScore4Overall
Min total score for valid matching
Float,
0.0 to 1.0
minMatchingScore4Overall: 0
minMatchingScore4SizeSimilarity
Min bbox size similarity score for valid matching
Float,
0.0 to 1.0
minMatchingScore4SizeSimilarity: 0.5
minMatchingScore4Iou
Min IOU score for valid matching
Float,
0.0 to 1.0
minMatchingScore4Iou: 0.1
minMatchingScore4VisualSimilarity
Min visual similarity score for valid matching
Float,
0.0 to 1.0
minMatchingScore4VisualSimilarity: 0.2
minTrackingConfidenceDuringInactive
Min tracking confidence during INACTIVE period
Float,
0.0 to 1.0
minTrackingConfidenceDuringInactive: 0.9
matchingScoreWeight4VisualSimilarity
Weight for visual similarity term in matching cost function
Float,
0.0 to 1.0
matchingScoreWeight4VisualSimilarity: 0.8
matchingScoreWeight4SizeSimilarity
Weight for size similarity term in matching cost function
Float,
0.0 to 1.0
matchingScoreWeight4SizeSimilarity: 0
matchingScoreWeight4Iou
Weight for IOU term in matching cost function
Float,
0.0 to 1.0
matchingScoreWeight4Iou: 0.1
matchingScoreWeight4Age
Weight for tracking age term in matching cost function
Float,
0.0 to 1.0
matchingScoreWeight4Age: 0.1