Gst-nvinferserver

The Gst-nvinferserver plugin does inferencing on input data using NVIDIA® Triton Inference Server (previously called TensorRT Inference Server) Release 2.40.0, NGC Container 23.11 for Jetson and Release 2.37.0, NGC Container 23.08 for dGPU on x86. Refer to the following README https://github.com/triton-inference-server/server/blob/r23.08/README.md The plugin accepts batched NV12/RGBA buffers from upstream. The NvDsBatchMeta structure must already be attached to the Gst Buffers.

The low-level library (libnvds_infer_server) operates on any of NV12 or RGBA buffers. The Gst-nvinferserver plugin passes the input batched buffers to the low-level library and waits for the results to be available. Meanwhile, it keeps queuing input buffers to the low-level library as they are received. Once the results are available from the low-level library, the plugin translates and attaches the results back in to Gst-buffer for downstream plugins.

The low-level library preprocesses the transformed frames (performs color conversion and scaling, normalization and mean subtraction) and produces final FP32/FP16/INT8/UINT8/INT16/UINT16/INT32/UINT32 RGB/BGR/GRAY planar/ packed data which is passed to the Triton for inferencing. The output type generated by the low-level library depends on the network type.

The pre-processing function is:

y = netscalefactor * (x - mean)

Where:

  • x is the input pixel value. It is an uint8 with range [0,255].

  • mean is the corresponding mean value, read either from the mean file or as offsets[c], where c is the channel to which the input pixel belongs, and offsets is the array specified in the configuration file. It is a float.

  • netscalefactor is the pixel scaling factor specified in the configuration file. It is a float.

  • y is the corresponding output pixel value. It can be of type float / half / int8 / uint8 / int16 / uint16 / int32 / uint32.

    Take specific example for uint8 to int8 conversion. set netscalefactor = 1.0 and mean = [128, 128, 128]. Then the function looks like:

    y = (1.0) * (x - 128)

Gst-nvinferserver currently works on the following type of networks:

  • Multi-class object detection

  • Multi-label classification

  • Segmentation

The Gst-nvinferserver plugin can work in three process modes:

  • Primary mode: Operates on full frames.

  • Secondary mode: Operates on objects added in the metadata by upstream components.

    When the plugin is operating as a secondary classifier in async mode along with the tracker, it tries to improve performance by avoiding re-inferencing on the same objects in every frame. It does this by caching the classification output in a map with the object’s unique ID as the key. The object is inferred upon only when it is first seen in a frame (based on its object ID) or when the size (bounding box area) of the object increases by 20% or more. This optimization is possible only when the tracker is added as an upstream element.

  • Preprocessed Tensor Input mode: Operates on tensors attached by upstream components.

    When operating in preprocessed tensor input mode, the pre-processing inside Gst-nvinferserver is completely skipped. The plugin looks for GstNvDsPreProcessBatchMeta attached to the input buffer and passes the tensor as is to the Tirton Inference Server without any modifications. This mode currently supports processing on full-frame and ROI. The GstNvDsPreProcessBatchMeta is attached by the Gst-nvdspreprocess plugin. This mode is enabled by adding the input_tensor_from_meta configuration message in the InferenceConfig message.

Detailed documentation of the Triton Inference Server is available at: https://github.com/triton-inference-server/server/blob/r23.08/README.md

The plugin supports Triton features along with multiple deep-learning frameworks such as TensorRT, TensorFlow (GraphDef / SavedModel), ONNX and PyTorch on Tesla platforms. On Jetson, it also supports TensorRT and TensorFlow (GraphDef / SavedModel). TensorFlow and ONNX can be configured with TensorRT acceleration. For details, see Framework-Specific Optimization. The plugin requires a configurable model repository root directory path where all the models need to reside. All the plugin instances in a single process must share the same model root. For details, see Model Repository. Each model also needs a specific config.pbtxt file in its subdirectory. For details, see Model Configuration. The plugin supports Triton ensemble mode to enable users to perform preprocessing or postprocessing with Triton custom backend. The plugin also supports the interface for custom functions for parsing outputs of object detectors, classifiers, and initialization of non-image input layers in cases where there is more than one input layer. Refer to sources/includes/nvdsinfer_custom_impl.h for the custom method implementations for custom models.

Gst-nvinferserver

Downstream components receive a Gst Buffer with unmodified contents plus the metadata created from the inference output of the Gst-nvinferserver plugin. The plugin can be used for cascaded inferencing. That is, it can perform primary inferencing directly on input data, then perform secondary inferencing on the results of primary inferencing, and so on. This is similar with Gst-nvinfer, see more details in Gst-nvinfer.

Inputs and Outputs

This section summarizes the inputs, outputs, and communication facilities of the Gst-nvinferserver plugin.

  • Inputs

    • Gst Buffer

    • NvDsBatchMeta (attaching NvDsFrameMeta)

    • Model repository directory path (model_repo.root)

    • gRPC endpoint URL (grpc.url)

    • Runtime model file with config.pbtxt file in model repository

  • Control parameters

    • Gst-nvinferserver gets control parameters from a configuration file. You can specify this by setting the property config-file-path. For details, see Gst-nvinferserver Configuration File Specifications. Other control parameters that can be set through GObject properties are:

      • Batch size

      • Process mode

      • Unique id

      • Inference on GIE id and operate on class ids [secondary mode only]

      • Inference interval

      • Raw output generated callback function

    • The parameters set through the GObject properties override the parameters in the Gst-nvinferserver configuration file.

  • Outputs

    • Gst Buffer

    • Depending on network type and configured parameters, one or more of:

      • NvDsObjectMeta

      • NvDsClassifierMeta

      • NvDsInferSegmentationMeta

      • NvDsInferTensorMeta

Gst-nvinferserver Configuration File Specifications

The Gst-nvinferserver configuration file uses prototxt format described in https://developers.google.com/protocol-buffers.

The protobuf message structures of this configuration file are defined by nvdsinferserver_plugin.proto and nvdsinferserver_config.proto. All the basic data-type values are set to 0 or false from protobuf’s guide. Map, arrays and oneof are set to empty by default. See more details for each message definition.

  • The message PluginControl in nvdsinferserver_plugin.proto is the entry point for this config-file.

  • The message InferenceConfig configures the low-level settings for libnvds_infer_server.

  • The message PluginControl::InputControl configures the input buffers, objects filtering policy for model inference.

  • The message PluginControl::OutputControl configures inference output policy for detections and raw tensor metadata.

  • The message BackendParams configures backend input/output layers and Triton settings in InferenceConfig.

  • The message PreProcessParams configures network preprocessing information in InferenceConfig.

  • The message InputTensorFromMeta enables the preprocessed tensor input mode and configures the input tensor information in InferenceConfig.

  • The message PostProcessParams configures the output tensor parsing methods such as detection, classification, semantic segmentation and others in InferenceConfig.

  • There are also other messages (e.g. CustomLib, ExtraControl) and enum types (e.g. MediaFormat, TensorOrder, …) defined in the proto file for miscellaneous settings for InferenceConfig and PluginControl.

Features

The following table summarizes the features of the plugin.

Gst-nvinferserver plugin features

Feature

dGPU

Jetson

Release

Gst-nvinferserver Running on Host

No

Yes

DS 5.0

Running on Docker Image

Yes

No

DS 5.0

DS Preprocessing: Network input format: RGB/BGR/Gray

Yes

Yes

DS 5.0

DS Preprocessing: Network input data types FP32/FP16/UINT8/INT8/UINT16/INT16/UINT32/INT32

Yes

Yes

DS 5.0

DS Preprocessing: Network input tensor orders NCHW / NHWC

Yes

Yes

DS 5.0

Mem: Cuda(GPU) buf-sharing for Input Tensors

Yes

Yes

DS 5.0

Mem: Cuda Memory (GPU / CPU-pinned) for output tensors

Yes

Yes

DS 5.0

Backend: TensorRT runtime (plan engine file)

Yes

Yes

DS 5.0

Backend: Tensorflow Runtime CPU/GPU (graphdef/savedmodel)

Yes

Yes

DS 5.0

Backend: Tensorflow Runtime with TF-TRT acceleration

Yes

Yes

DS 5.0

Backend: ONNX Runtime

Yes

Yes(DS 6.0)

DS 5.0

Backend: ONNX Runtime with ONNX-TRT acceleration

Yes

Yes(DS 6.0)

DS 5.0

Backend: Pytorch Runtime

Yes

No

DS 5.0

[deprecated, will be removed in future version] Backend: Caffe2 Runtime

Yes (very few tests)

No

DS 5.0

Postprocessing: DS Detection / Classification/ Segmentation

Yes

Yes

DS 5.0

Postprocessing: DS Detection cluster method: NMS / GroupRectangle / DBSCan / None

Yes

Yes

DS 5.0

Postprocessing: custom parsing (NvDsInferParseCustomTfSSD)

Yes

Yes

DS 5.0

Postprocessing: Triton native classification

Yes

Yes

DS 5.0

Triton Ensemble Mode (Triton preproc/postproc) with specified media-format (RGB/BGR/Gray) with Cuda GPU buffer as inputs

Yes

Yes

DS 5.0

Postprocessing: Attach Triton raw tensor output in NvDsInferTensorMeta for downstream or application postprocessing

Yes

Yes

DS 5.0

deepstream-app: pipeline works with PGIE / SGIE / nvtracker

Yes

Yes

DS 5.0

Sample App: deepstream-segmentation-test / deepstream-infer-tensor-meta-test

Yes

Yes

DS 5.0

Basic LSTM features on single batch and single stream (beta version, config file might be changed in future version)

Yes

Yes

DS 5.0

gRPC: Triton Server running as independent process and plugin communicates through gRPC

Yes

Yes

DS 6.0

Custom process interface IInferCustomProcessor for extra multi-input tensors preprocess, multi-streams LSTM loop process, custom output data postprocess(parsing and metadata attaching). Basic single-stream LSTM could be replaced by custom loop process

Yes

Yes

DS 6.0

gRPC: CUDA buffer sharing with local Triton server for input tensors

Yes

No

DS 6.2

Postprocessing: Clip object bounding boxes to fit within the ROI

Yes

Yes

DS 6.3

Gst-nvinferserver plugin features message PluginControl definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs
(Primary/Secondary)

infer_config

Low-level libnvds_infer_server inference configuration settings

InferenceConfig

infer_config { … } see details in InferenceConfig

All

Both

input_control

Control plugin input buffers, objects filtering policy for inference

PluginControl ::InputControl

input_control{ process_mode: PROCESS_MODE_FULL_FRAME } see details in InputControl

All

Both

output_control

Control plugin output metadata filtering policy after inference

PluginControl ::OutputControl

output_control { … } see details in OutputControl

All

Both

process_mode

Processing mode, selected from PluginControl::ProcessMode. In deepstream-app PGIE uses PROCESS_MODE_FULL_FRAME by default, SGIE use PROCESS_MODE_CLIP_OBJECTS by default

enum PluginControl::ProcessMode

process_mode: PROCESS_MODE_FULL_FRAME

All

Both

operate_on_gie_id

Unique ID of the GIE on whose metadata (bounding boxes) this GIE is to operate on

int32, >=0, valid gie-id. -1, disable gie-id check, inference on all GIE Ids

operate_on_gie_id: 1

All

Secondary

operate_on_class_ids

Class IDs of the parent GIE on which this GIE is to operate on

Comma delimited int32 array

operate_on_class_ids: [1, 2] Operates on objects with class IDs 1, 2 generated by parent GIE

All

Secondary

interval

Specifies the number of consecutive, batches to be skipped for inference. default is 0

uint32

interval: 1

All

Primary

async_mode

Enables inference on detected objects and asynchronous metadata attachments. Works only when tracker-ids are attached. Pushes buffer downstream without waiting for inference results. Attaches metadata after the inference

bool

async_mode: false

Classifier

Secondary

object_control

input object filter settings

PluginControl::InputObjectControl

object_control { bbox_filter { min_width: 64 min_height: 64 } } see details in InputObjectControl

All

Secondary

Gst-nvinferserver plugin message PluginControl-InputControl definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs
(Primary/Secondary)

process_mode

Processing mode, selected from PluginControl::ProcessMode. In deepstream-app PGIE uses PROCESS_MODE_FULL_FRAME by default, SGIE use PROCESS_MODE_CLIP_OBJECTS by default

enum PluginControl::ProcessMode

process_mode: PROCESS_MODE_FULL_FRAME

All

Both

operate_on_gie_id

Unique ID of the GIE on whose metadata (bounding boxes) this GIE is to operate on

int32, >=0, valid gie-id. -1, disable gie-id check, inference on all GIE Ids

operate_on_gie_id: 1

All

Secondary

operate_on_class_ids

Class IDs of the parent GIE on which this GIE is to operate on

Comma delimited int32 array

operate_on_class_ids: [1, 2] Operates on objects with class IDs 1, 2 generated by parent GIE

All

Secondary

interval

Specifies the number of consecutive, batches to be skipped for inference. default is 0

uint32

interval: 1

All

Primary

async_mode

Enables inference on detected objects and asynchronous metadata attachments. Works only when tracker-ids are attached. Pushes buffer downstream without waiting for inference results. Attaches metadata after the inference

bool

async_mode: false

Classifier

Secondary

object_control

input object filter settings

PluginControl::InputObjectControl

object_control { bbox_filter { min_width: 64 min_height: 64 } } see details in InputObjectControl

All

Secondary

secondary_reinfer_interval

Re-inference interval for objects, in frames

uint32

secondary_reinfer_interval: 90

All

Secondary

Gst-nvinferserver plugin message PluginControl-OutputControl definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs
(Primary/Secondary)

output_tensor_meta

Enable attaching Inference output tensor metadata, tensor buffer pointer for host only

bool

output_tensor_meta: false

All
Both

detect_control

Specifies detection output filter policy

PluginControl::OutputDetectionControl

detect_control { default_filter { bbox_filter { min_width: 32 min_height: 32 } } } see details in OutputDetectionControl

Detector
Both

classifier_type

The classifier type to be added in the NvDsClassifierMeta in case of classification networks

string

classifier_type: multi_class_classification

Classifier
Both
Gst-nvinferserver plugin message PluginControl-InputObjectControl definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs
(Primary/Secondary)

bbox_filter

Bounding box filter

PluginControl::BBoxFilter

bbox_filter { min_width: 32 min_height: 32 } see details in BBoxFilter

All

Secondary

Gst-nvinferserver plugin message PluginControl-BBoxFilter definition details for Input and Output controls

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

min_width

Bounding box minimum width

uint32

min_width: 64

All

Both

min_height

Bounding box minimum height

uint32

min_height: 64

All

Both

max_width

Bounding box maximum width, default 0, max_width is ignored

uint32

max_width: 640

All

Both

max_height

Bounding box maximum height, default 0, max_height is ignored

uint32

max_height: 640

All

Both

Gst-nvinferserver plugin message PluginControl-OutputDetectionControl definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

default_filter

default detection filter for output controls

PluginControl::DetectClassFilter

default_filter { bbox_filter { min_width: 32 min_height: 32 } } see details in DetectClassFilter

All

Both

specific_class_filters

specifies detection filters per class to replace default filter

map<uint32, DetectClassFilter>

specific_class_filters: [ { key: 1, value {…} }, { key: 2, value {…} } ]

All

Both

Gst-nvinferserver plugin message PluginControl-DetectClassFilter definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

bbox_filter

detection bounding box filter

PluginControl::BBoxFilter

bbox_filter { min_width: 64 min_height: 64 }

Detection
Both

roi_top_offset

Offset of the RoI from the top of the frame. Only objects within the RoI are output.

uint32

roi_top_offset: 128

Detection
Both

roi_bottom_offset

Offset of the RoI from the bottom of the frame. Only objects within the RoI are output.

uint32

roi_bottom_offset:

Detection
Both

border_color

specify border color for detection bounding boxes

PluginControl::Color

border_color { r: 1.0 g: 0.0 b: 0.0 a: 1.0 }

Detection
Both

bg_color

specify background color for detection bounding boxes

PluginControl::Color

border_color { r: 0.0 g: 1.0 b: 0.0 a: 0.5 }

Detection
Both
Gst-nvinferserver plugin message PluginControl-Color definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

r

Red color value

float Range[0.0, 1.0]

r: 0.5

All
Both

g

Green color value

float. Range[0.0, 1.0]

g: 0.5

All
Both

b

Blue color value

float. Range[0.0, 1.0]

b: 0.3

All
Both

a

Alpha blending value

float. Range[0.0, 1.0]

a: 1.0

All
Both

The message InferenceConfig defines all the low-level structure fields in nvdsinferserver_config.proto. It has major settings for inference backend, network preprocessing and postprocessing.


Gst-nvinferserver message InferenceConfig definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

unique_id

Unique ID identifying metadata generated by this GIE

uint32, ≥0

unique_id: 1

All
Both

gpu_ids

Device IDs of GPU to use for pre-processing/inference (single GPU support only)

int32 array, ≥0

gpu_ids: [0]

All
Both

max_batch_size

Max number of frames/objects to be inferred together in a batch

uint32, ≥0

max_batch_size: 1

All
Both

backend

Inference backend settings

BackendParams

backend { trt_is { ... } } see details in BackendParams

All
Both

preprocessing

One of preprocess or input_tensor_from_meta. Use preprocess if using frame or object processing mode and input_tensor_from_meta when using preprocessed tensor input mode

preprocess or input_tensor_from_meta

“N/A. Refer preprocess and input_tensor_from_meta below”

All
Both

preprocess

Network preprocessing settings for color conversion, scale and normalization applicable when using frame or object processing mode

PreProcessParams

preprocess { normalize { } } see details in PreProcessParams

All
Both

input_tensor_from_meta

Configuration for the input tensor applicable when using preprocessed tensor as input

InputTensorFromMeta

input_tensor_from_meta { is_first_dim_batch : true } see details in InputTensorFromMeta

All
Preprocessed tensor input mode

postprocess

Inference output tensor parsing methods such as detection, classification, semantic segmentation and others

PostProcessParams

postprocess { detection {...} } see details in PostProcessParams

All
Both

custom_lib

Specify custom lib path for custom parsing functions and preloads, optional

CustomLib

custom_lib { path : ./libcustom_parsing.so }

All
Both

extra

extra controls for inference config.

ExtraControl

extra { output_buffer_pool_size: 2 } see details in ExtraControl

All
Both

lstm

LSTM control parameters, limited on batch-size 1 and single stream

LstmParams [optional]

``lstm {
loops {

input: “init_lstm_c” output: “output/lstm_c” init_const { value: 0 }

}

}`` See details in LstmParams

All
Both

clip_object_outside_roi

Clip the object bounding boxes to fit within the specified ROI boundary.

bool

clip_object_outside_roi: false

Detector
Both

Gst-nvinferserver message BackendParams definition details

Name

Description

Type and Range

Example Notes

Network Types/Applicable to GIEs (Primary/Secondary)

inputs

Backend input layer settings, optional

InputLayer arrays

see details in InputLayer

All/Both

outputs

Backend output layer settings, optional

OutputLayer arrays

see details in OutputLayer

All/Both

triton

backend of Triton Inference Server settings

TritonParams

see details in TritonParams

All/Both

output_mem_type

Triton native output tensor memory type

MemoryType select from [MEMORY_TYPE_DEFAULT, MEMORY_TYPE_CPU, MEMORY_TYPE_GPU]

output_mem_type: MEMORY_TYPE_CPU

All/Both

Gst-nvinferserver message InputLayer definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

name

input tensor name

string

name: “input_0”

All

Both

dims

input tensor shape, optional. Only required if backend cannot figure out fixed input shapes

int32 array, > 0

dims: [299, 299, 3]

All

Both

data_type

enum TensorDataType with types:

TENSOR_DT_NONE,
TENSOR_DT_FP32,
TENSOR_DT_FP16,
TENSOR_DT_INT8,
TENSOR_DT_INT16,
TENSOR_DT_INT32,
TENSOR_DT_UINT8,
TENSOR_DT_UINT16,
TENSOR_DT_UINT32
Default TENSOR_DT_NONE,

usually can be deduced from Triton model config.pbtxt

TensorDataType

data_type: TENSOR_DT_FP32

All

Both

Gst-nvinferserver message OutputLayer definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

name

output tensor name

string

name: “detection_boxes”

All
Both

max_buffer_bytes

output tensor reserved buffer bytes

uint64

max_buffer_bytes: 2048

All
Both
Gst-nvinferserver message TritonParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

model_name

Triton inference model name

string

model_name: “ssd_inception_graphdef”

All
Both

version

Triton model version number. -1, latest version number. >0, reserved for specific version number in future version

int64

version: -1

All
Both

model_repo

Triton model repository settings. Note, all model_repo settings must be same in single process

TritonParams::TritonModelRepo

model_repo { root: “../triton_model_repo” log_level: 2 } Refer the details in TritonModelRepo

All
Both

grpc

Triton gRPC server settings.

TritonParams::TritonGrpcParams

grpc { url: “localhost:8001” enable_cuda_buffer_sharing: false } Refer the details in TritonGrpcParams

All
Both
Gst-nvinferserver message TritonParams-TritonModelRepo definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

root

Triton inference model repository directory path

string

root: “../triton_model_repo”

All
Both

log_level

Triton log output levels

uint32; 0, ERROR; 1, WARNING; 2, INFO; >=3, VERBOSE Level

log_level: 1

All
Both

strict_model_config

Enable Triton strict model configuration, see details in Triton Generated Model Configuration. Suggest setting value true

bool

strict_model_config: true

All
Both

tf_gpu_memory_fraction

TensorFlow GPU memory fraction per process. Valid for Tensorflow models only. Default 0 means no GPU memory limitation. Suggest tuning to a proper value (e.g. in range of [0.2, 0.6]) in case Tensorflow uses up whole GPU memory

float, Range (0, 1.0]

tf_gpu_memory_fraction: 0.6

All
Both

tf_disable_soft_placement

Disable TensorFlow soft placement of operators. It’s enabled by default.

bool

tf_disable_soft_placement: false

All
Both

min_compute_capacity

Specify minimal GPU compute capacity. The default value is 6.0 on x86 and 5.0 on Jetson.

min_compute_capacity: 6.0

All
Both

backend_dir

Specify Triton backend directory which store Tensorflow/Onnx/Pytorch and custom backends. The Default value is /opt/tritonserver/backends on X86 and opt/nvidia/deepstream/deepstream-x.x/lib/triton_backends on Jetson.

string

backend_dir: /opt/tritonserver/backends/

All
Both

cuda_device_memory

Specify a list of CudaDeviceMem blocks with pre-allocated memory pool. Use Triton’s default value if list is empty.

message list

cuda_device_memory [ {

device: 0 memory_pool_byte_size: 2000000000

} ]

All
Both

CudaDeviceMem::device

Specify device ID

uint32; >= 0

device: 0

All
Both

CudaDeviceMem::memory_pool_byte_size

Indicate pre-allocated memory pool byte size on according device for Triton runtime

uint64; >= 0

memory_pool_byte_size: 8000000000

All
Both

pinned_memory_pool_byte_size

Indicate pre-allocated Pinned memory on host for Triton runtime. Use Triton’s defult value (around 256MB) if not set.

uint64; >= 0

pinned_memory_pool_byte_size: 128000000

All
Both

backend_configs

A list of BackendConfig blocks for Tritonserver backend config settings

message list

backend_configs [ {

backend: tensorflow setting: “allow-soft-placement” value: “true”

} ]

All
Both

BackendConfig::backend

Specify backend name

string

backend: tensorflow

All
Both

BackendConfig::setting

Specify backend setting name

string

setting: “allow-soft-placement”

All
Both

BackendConfig::value

Specify backend setting values

string

value: “true”

All
Both
Gst-nvinferserver message TritonParams-TritonGrpcParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

url

Triton server name and port

string

url: “localhost:8001”

All
Both

enable_cuda_buffer_sharing

“Enable sharing of CUDA buffers with local Triton server for input tensors.

If enabled

the input CUDA buffers are shared with the Triton server to improve performance.

This feature should be enabled only when the Triton server is on the same machine.

Applicable for x86 dGPU platform

not supported on Jetson devices.

By default disabled i.e. CUDA buffers are copied to system memory while creating the inference request.”

Bool

enable_cuda_buffer_sharing: true

All
Both
Gst-nvinferserver message PreProcessParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs
(Primary/Secondary)

network_format

enum MediaFormat with formats: MEDIA_FORMAT_NONE IMAGE_FORMAT_RGB IMAGE_FORMAT_BGR IMAGE_FORMAT_GRAY . use IMAGE_FORMAT_RGB by default.

MediaFormat

network_format: IMAGE_FORMAT_RGB

All

Both

tensor_order

enum TensorOrder with order types: TENSOR_ORDER_NONE, TENSOR_ORDER_LINEAR(this includes NCHW, CHW, DCHW, … orders), TENSOR_ORDER_NHWC. It can deduce the value from backend layers info if set to TENSOR_ORDER_NONE

TensorOrder

tensor_order: TENSOR_ORDER_NONE

All

Both

tensor_name

Specify the tensor name for the preprocessing buffer. This is in the case when multiple input tensors in a single network.

string;
Optional

tensor_name: “input_0”

All

Both

frame_scaling_hw

Compute hardware to use for scaling frames / object crops to network resolution

enum FrameScalingHW FRAME_SCALING_HW_DEFAULT: Platform default – GPU (dGPU), VIC (Jetson) FRAME_SCALING_HW_GPU FRAME_SCALING_HW_VIC (Jetson only)

frame_scaling_hw: FRAME_SCALING_HW_GPU

All

Both

frame_scaling_filter

The filter to use for scaling frames / object crops to network resolution

int32, refer to enum NvBufSurfTransform_Inter in nvbufsurftransform.h for valid values

frame_scaling_filter: 1

All

Both

maintain_aspect_ratio

Indicates whether to maintain aspect ratio while scaling input.

int32; 0 or 1

maintain_aspect_ratio: 0

All

Both

symmetric_padding

Indicates whether to pad image symmetrically while scaling input. DeepStream pads the images asymmetrically by default.

int32; 0 or 1

symmetric_padding: 0

All

Both

normalize

Network input tensor normalization settings for scale-factors, offsets and mean-subtraction

PreProcessParams::ScaleNormalize

normalize { scale_factor: 1.0 channel_offsets: [0, 0, 0] } see details in PreProcessParams::ScaleNormalize

Gst-nvinferserver message PreProcessParams-ScaleNormalize definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

scale_factor

Pixel normalization factor

float

scale_factor: 0.0078

All
Both

channel_offsets

Array of mean values of color components to be subtracted from each pixel. Array length must equal the number of color components in the frame. The plugin multiplies mean values by scale_factor.

float array, Optional

channel_offsets: [77.5, 21.2, 11.8]

All
Both

mean_file

Pathname of mean data file (PPM format)

string; Optional

mean_file: “./model_meanfile.ppm”

All
Both
Gst-nvinferserver message InputTensorFromMeta definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs
(Primary/Secondary)

is_first_dim_batch

Boolean indicating whether the preprocessed input tensor has first dimention as batch. Set true for batched input, false otherwise.

Boolean

is_first_dim_batch: true

All

Preprocessed tensor input mode

Gst-nvinferserver message PostProcessParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

labelfile_path

Pathname of a text file containing the labels for the model

string

labelfile_path: “=/home/ubuntu/model_labels.txt”

All

Both

oneof process_type

Indicates one of the postprocessing type detection; classification; segmentation; other;

None

N/A

All

Both

detection

Specify detection parameters for the network. It must be oneof process_type

DetectionParams

detection { num_detected_classes: 4 simple_cluster { threshold: 0.2 } } see details in DetectionParams

Detector

Both

classification

Specify classification parameters for the network It is oneof process_type

ClassificationParams

classification { threshold: 0.6 } see details in ClassificationParams

Classifier

Both

segmentation

Specify semantic segmentation parameters for the network It is oneof process_type

SegmentationParams

segmentation { threshold: 0.2 num_segmentation_classes: 2 }

Segmentation

Both

other

Specify other network parameters. This is for user-defined networks and usually coexists with output_control.output_tensor_meta: true. Tensor output data would be attached into GstBuffer. Data can be parsed in application. User can increase extra.output_buffer_pool_size if need to hold metadata longer. It is oneof process_type

OtherNetworkParams

other {} see details in OtherNetworkParams

Others

Both

triton_classification

Specify Triton classification parameters for the network It is oneof process_type

TritonClassifyParams

Triton_classification { topk: 1 } see details in TritonClassifyParams

Classifier

Both

Gst-nvinferserver message DetectionParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

num_detected_classes

Define number of classes detected by the network

int32, > 0

num_detected_classes:4

Detector

Both

per_class_params

Map of specific detection parameters per class. Key-value follows <class_id: per_class_params> order.

map<int32, PerClassParams>; Optional

per_class_params [

{ key: 1, value { pre_threshold : 0.4} },

{ key: 2, value { pre_threshold : 0.5} }

] see details for PerClassParams

Detector

Both

custom_parse_bbox_func

Name of the custom bounding box parsing function. If not specified, Gst-nvinferserver uses the internal function for the resnet model provided by the SDK. If specified, also need to set custom_lib to load custom library.

string;

custom_parse_bbox_func: “NvDsInferParseCustomTfSSD”

Detector

Both

oneof clustering_policy

Indicates one of the clustering policies from nms; dbscan; group_rectangle; simple_cluster;

None

N/A

Detector

Both

nms

Indicates clustering bounding boxes by Non-Maximum-Suppression method detected objects. It is oneof clustering_policy

Nms

nms { confidence_threshold: 0.3 iou_threshold: 0.4 } see details in Nms

Detector

Both

dbscan

Indicates clustering bounding boxes by DBSCAN method for detected objects. It is oneof clustering_policy

DbScan

dbscan { pre_threshold: 0.3 eps: 0.7 min_boxes: 3 } see details in DbScan

Detector

Both

group_rectangle

Indicates clustering bounding boxes by groupRectangles() function for grouping detected objects It is oneof clustering_policy

GroupRectangle

group_rectangle { confidence_threshold: 0.2 group_threshold: 2 eps: 0.2 }

Detector

Both

simple_cluster

Indicates simple clustering method by outlier boxes through threshold

SimpleCluster

simple_cluster { threshold: 0.2 }

Detector

Both


Gst-nvinferserver message DetectionParams-PerClassParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

pre_threshold

Define confidence threshold per class

float

pre_threshold:0.3

Detector

Both


Gst-nvinferserver message DetectionParams-Nms definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

confidence_threshold

Detection score lesser than this threshold would be rejected

float

confidence_threshold:0.5

Detector

Both

iou_threshold

Maximum IOU score between two proposals after which the proposal with the lower confidence will be rejected.

float

iou_threshold: 0.3

Detector

Both

topk

Specify top k detection results to keep after nms

int32, >= 0

topk: 2; value 0, means keep all.

Detector

Both


Gst-nvinferserver message DetectionParams-DbScan definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

pre_threshold

Detection score lesser than this threshold would be rejected before DBSCAN clustering

float

pre_threshold:0.2

Detector

Both

eps

DBSCAN epsilon to control merging of overlapping boxes.

float

eps: 0.7

Detector

Both

min_boxes

Minimum boxes in DBSCAN cluster to be considered an object

int32, > 0

min_boxes: 3;

Detector

Both

min_score

Minimum score in DBSCAN cluster for it to be considered as an object

float

min_score: 0.7

Default value is 0

Detector

Both


Gst-nvinferserver message DetectionParams-GroupRectangle definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

confidence_threshold

Detection score lesser than this threshold would be rejected

float

confidence_threshold:0.2

Detector

Both

group_threshold

Threshold value for rectangle merging for OpenCV grouprectangles() function

int32; >= 0

group_threshold: 1

Detector

Both

eps

Epsilon to control merging of overlapping boxes

float

eps: 0.2

Detector

Both


Gst-nvinferserver message DetectionParams-SimpleCluster definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

threshold

Detection score lesser than this threshold would be rejected

float

confidence_threshold:0.6

Detector

Both


Gst-nvinferserver message ClassificationParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

threshold

Classification score lesser than this threshold would be rejected

float

threshold: 0.5

Classifier

Both

custom_parse_classifier_func

Name of the custom classifier output parsing function. If not specified, Gst-nvinfer uses the internal parsing function with NCHW tensor order for softmax layers. User can reshape other output tensor order to NCHW in Triton config.pbtxt to run internal parsing. If specified, also need to set custom_lib to load custom library.

string

parse-classifier-func-name: “parse_bbox_softmax”

Classifier

Both


Gst-nvinferserver message SegmentationParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

threshold

Segmentation score lesser than this threshold would be rejected

float

threshold: 0.5

Segmentation

Both

num_segmentation_classes

Number of output classes for the segmentation network

int32, >0

num_segmentation_classes: 2

Segmentation

Both

custom_parse_segmentation_func

Name of the custom segmentation output parsing function. If not specified, Gst-nvinferserver uses the internal function for the UNet model provided by the SDK. If specified, users also need to set custom_lib to load custom library.

string

custom_parse_segmentation_func: “NvDsInferParseCustomPeopleSemSegNet”

Segmentation

Both


Gst-nvinferserver message OtherNetworkParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

type_name

Specify a user-defined network name

string; | Optional

type_name: “face”

Others

Both


Gst-nvinferserver message TritonClassifyParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

topk

Specify top k elements need to keep from Triton’s native classification

uint32; >=0

topk : 1 Value 0 or empty would keep the top 1 result.

Classifier

Both

threshold

Classification score lesser than this threshold would be rejected

float

threshold: 0.5

Classifier

Both


Gst-nvinferserver message CustomLib definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

path

Pathname that points to a custom library for preload

string

path: “/home/ubuntu/lib_custom_impl.so”

All

Both


Gst-nvinferserver message ExtraControl definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs
(Primary/Secondary)

copy_input_to_host_buffers

Enable to copy input. If enabled, input tensor would be attached as NvDsInferTensorMeta into GstBuffer with output tensors together tensor data to host buffers.

bool

copy_input_to_host_buffers: false

All

Both

output_buffer_pool_size

Specify the buffer pool size for each output tensor. When infer_config.postprocess.other is specified or output_control.output_tensor_meta is enabled, the output tensor would be attached as NvDsInferTensorMeta into GstBuffer

int32; Range [2, 10]

output_buffer_pool_size: 4

All

Both

custom_process_funcion

custom function to create a specific user-defined processor IInferCustomProcessor. The function symbol is loaded by infer_config.custom_lib

String

path: “libnvdsinfer_custom_impl_fasterRCNN.so”

All

Both


Note

LstmParams structures may be changed in future versions


Gst-nvinferserver message LstmParams definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

loops

Specify LSTM loops between input and output tensors.

LstmLoop [repeated]

loops [ {

input: “init_state” output: “out_state”

} ] See details in LstmParams::LstmLoop

All

Both


Note

  • Input and output tensors must have same datatype/dimensions, FP16 is not supported

  • LstmParams::LstmLoop structures might be changed in future versions

Gst-nvinferserver message LstmParams-LstmLoop definition details

Name

Description

Type and Range

Example Notes

Network Types /
Applicable to GIEs (Primary/Secondary)

input

Specify input tensor name of the current loop.

string

Input: “init_state”

All

Both

output

Specify input tensor name of the current loop. Tensor data will feedback to the input tensor

string

onput: “output_state”

All

Both

init_const

Specify the constant values for the input in first frame

InitConst | value: float

Init_const { value: 0 }

All

Both


Gst Properties

The values set through Gst properties override the values of properties in the configuration file. The application does this for certain properties that it needs to set programmatically. If user set property though plugin, these values would replace the original value in config files. The following table describes the Gst-nvinferserver plugin’s Gst properties.


Gst-nvinferserver plugin Gst properties

Property

Meaning

Type and Range

Example Notes

config-file-path

Absolute pathname of configuration file for the Gst-nvinferserver element

String

config-file-path=config_infer_primary.txt

process-mode

Infer Processing Mode (0):None, (1)FullFrame, (2)ClipObject. If set, it could replace input_control.process_mode

Integer, 0, 1 or 2

process-mode=1

unique-id

Unique ID identifying metadata generated by this GIE. If set, it could replace infer_config.unique_id

Integer, 0 to 4,294,967,295

unique-id=1

infer-on-gie-id

See input_control.operate_on_gie_id in the configuration file table

Integer, 0 to 4,294,967,295

infer-on-gie-id=1

operate-on-class-ids

See input_control.operate_on_class_ids in the configuration file table

An array of colon- separated integers (class-ids)

operate-on-class-ids=1:2:4

batch-size

Number of frames/objects to be inferred together in a batch. If set, it could replace infer_config.max_batch_size

Integer, 1 – 4,294,967,295

batch-size=4

Interval

Number of consecutive batches to be skipped for inference If set, it could replace input_control.interval

Integer, 0 to 32

interval=0

raw-output-generated-callback

Pointer to the raw output generated callback function

Pointer

Cannot be set through gst-launch

raw-output-generated-userdata

Pointer to user data to be supplied with raw-output-generated-callback

Pointer

Cannot be set through gst-launch


DeepStream Triton samples

DeepStream Triton samples are located in the folder samples/configs/deepstream-app-triton. In terms of Triton model specification, all related models and Triton config files(config.pbtxt) must be gathered into same root directory which is samples/triton_model_repo. Follow the instructions in samples/configs/deepstream-app-triton/README to run the samples.

DeepStream Triton gRPC support

In addition to native Triton server, gst-nvinferserver supports the Triton Inference Server running as independent process. Communication to the server happens through gRPC. Config files to run the application in gRPC mode are located at samples/config/deepstream-app-triton-grpc. Follow the instructions in samples/configs/deepstream-app-triton-grpc/README to run the samples.

Triton Ensemble Models

The Gst-nvinferserver plugin can support Triton ensemble models for further custom preprocessing, backend and postprocessing through Triton custom-backends. Triton ensemble model represents a pipeline of one or more models and the connection of input and output tensors between those models, such as “data preprocessing -> inference -> data postprocessing”. See more details https://github.com/triton-inference-server/server/blob/r23.08/docs/user_guide/architecture.md#ensemble-models. To manage memory efficiency and keep clean interface, The Gst-nvinferserver Plugin’s default preprocessing cannot be disabled. Color conversion, datatype conversion, input scaling and object cropping are continue working in nvds_infer_server natively. For example, in the case native normalization is not needed, update scale_factor to 1.0:

infer_config { preprocess {
network_format: IMAGE_FORMAT_RGB
tensor_order: TENSOR_ORDER_LINEAR
normalize { scale_factor: 1.0 } } }

The low level nvds_infer_server library could deliver specified media-format (RGB/BGR/Gray) in any kind of tensor orders and datatypes as a Cuda GPU buffer input to Triton backend. User’s custom-backend must support GPU memory on this input. Triton custom-backend sample identity can work with Gst-nvinferserver plugin.

Note

Custom backend API must have same Triton codebase version (23.08). Read more details from Triton server release https://github.com/triton-inference-server/server/releases/tag/v2.37.0

To learn details how to implement Triton custom-backend, please refer to https://github.com/triton-inference-server/backend/tree/r23.08#how-can-i-develop-my-own-triton-backend For Triton model’s output, TRTSERVER_MEMORY_GPU and TRTSERVER_MEMORY_CPU buffer allocation are supported in nvds_infer_server according to Triton output request. This also works for ensemble model’s final output tensors. Finally, inference data can be parsed by default for detection, classification, or semantic segmentation. Alternatively, user can implement custom-backend for postprocessing, then deliver the final output to Gst-nvinferserver plugin to do further processing. Besides that, User can also optionally attach raw tensor output data into metadata for downstream or application to parse.

Custom Process interface IInferCustomProcessor for Extra Input, LSTM Loop, Output Data Postprocess

Gst-nvinferserver plugin supports extra(multiple) input tensors custom preprocessing, input / output tensor custom loop processing (LSTM-based) with multiple streams, output tensor data custom parsing and attaching into NvDsBatchMeta. This custom function is loaded though config file:

infer_config {
extra {
  custom_process_funcion: "CreateInferServerCustomProcess"
}
custom_lib {
  path: "/path/to/libnvdsinferserver_custom_process.so"
} }
  • For extra input tensors preprocess: If the model requires multiple tensor inputs more than the primary image input, Users can derive from this interface IInferCustomProcessor and implement extraInputProcess() to process extra inputs tensors. This function is for extra input process only. the parameter IOptions* options would carry all the information from GstBuffer, NvDsBatchMeta, NvDsFrameMeta, NvDsObjectMeta and so on. User can leverage all of the information from options to fill the extra input tensors. All of the input tensor memory is allocated by nvdsinferserver low-level lib.

  • For multi-stream custom loop process: If the model is LSTM based, and next frame’s inputs are generated by previous frame’s output data. User can derive interface IInferCustomProcessor, then implement extraInputProcess() and inferenceDone() for loop process. extraInputProcess() could initialize first input tensor states. Then ‘inferenceDone()’ can get the output data and do post processing and store the result into the context. When next ‘extraInputProcess()’ is coming, it can check the stored results and feedback into tensor states. When user override bool requireInferLoop() const { return true; }. The nvdsinferver low-level lib shall keep the extraInputProcess and inferenceDone running in sequence along with its nvds_stream_ids which could be get from options->getValueArray(OPTION_NVDS_SREAM_IDS, streamIds). see examples and details in sources/objectDetector_FasterRCNN/nvdsinfer_custom_impl_fasterRCNN/nvdsinferserver_custom_process.cpp. Inside this example, see function NvInferServerCustomProcess::feedbackStreamInput how to feedback output into next input loop.

  • For output tensor postprocess(parsing and metadata attaching): If user want to do custom parsing on output tensors into user metadata and attach them into GstBuffer, NvDsBatchMeta, NvDsFrameMeta or NvDsObjectMeta. User can implement ‘inferenceDone(outputs, inOptions)’ to parse all output tensors in outputs, and get above GstBuffer, NvDsBatchMeta and other DeepStream information from inOptions. Then attach the parsed user metadata into NvDs metadata. This function supports multiple-streams parsing and attaching. See examples in sources/objectDetector_FasterRCNN/nvdsinfer_custom_impl_fasterRCNN/nvdsinferserver_custom_process.cpp: NvInferServerCustomProcess::inferenceDone() how to parse and attach output metadata.

Note

If user need specific memory type(e.g. CPU) for output tensors in inferenceDone(). Update config file.

infer_config { backend {
output_mem_type: MEMORY_TYPE_CPU
} }

The interface IInferCustomProcessor is defined in sources/includes/nvdsinferserver/infer_custom_process.h.

class IInferCustomProcessor {
     virtual void supportInputMemType(InferMemType& type); // return supported memory type for `extraInputs`
     virtual bool requireInferLoop() const;
     virtual NvDsInferStatus extraInputProcess(const vector<IBatchBuffer*>& primaryInputs, vector<IBatchBuffer*>& extraInputs, const IOptions* options) = 0;
     virtual NvDsInferStatus inferenceDone(const IBatchArray* outputs, const IOptions* inOptions) = 0;
     virtual void notifyError(NvDsInferStatus status) = 0;
};

Tensor Metadata Output for Downstream Plugins

The Gst-nvinferserver plugin can attach raw output tensor data generated by the inference backend as metadata. It is added as an NvDsInferTensorMeta in the frame_user_meta_list member of NvDsFrameMeta for primary (full frame) mode, or in the obj_user_meta_list member of NvDsObjectMeta for secondary (object) mode. It uses same metadata structure with Gst-nvinferserver plugin.

Note

Gst-nvinferserver plugin does not attach device buffer pointer NvDsInferTensorMeta::attach out_buf_ptrs_dev at this moment.

To read or parse inference raw tensor data of output layers

  1. Enable the following fields in the configuration file for the Gst-nvinferserver plugin:

    output_control { output_tensor_meta : true }
    

    If native postprocessing need be disabled, update:

    infer_config { postprocess { other {} } }
    
  2. When operating as primary GIE, NvDsInferTensorMeta is attached to each frame’s (each NvDsFrameMeta object’s) frame_user_meta_list. When operating as secondary GIE, NvDsInferTensorMeta is attached to each NvDsObjectMeta object’s obj_user_meta_list.

    Metadata attached by Gst-nvinferserver can be accessed in a GStreamer pad probe attached downstream from the Gst-nvinferserver instance.

  3. The NvDsInferTensorMeta object’s metadata type is set to NVDSINFER_TENSOR_OUTPUT_META. To get this metadata you must iterate over the NvDsUserMeta user metadata objects in the list referenced by frame_user_meta_list or obj_user_meta_list.

For more information about Gst-infer tensor metadata usage, see the source code in sources/apps/sample_apps/deepstream_infer_tensor_meta-test.cpp, provided in the DeepStream SDK samples.

Segmentation Metadata

The Gst-nvinferserver plugin attaches the output of the semantic segmentation model as user metadata in an instance of NvDsInferSegmentationMeta with meta_type set to NVDSINFER_SEGMENTATION_META. The user metadata is added to the frame_user_meta_list member of NvDsFrameMeta for primary (full frame) mode, or the obj_user_meta_list member of NvDsObjectMeta for secondary (object) mode. For guidance on how to access user metadata, see User/Custom Metadata Addition Inside NvDsMatchMeta and Tensor Metadata, above.