Gst-nvinferaudio

The Gst-nvinferaudio plugin does inferencing on input data using NVIDIA® TensorRT™. The plugin accepts batched audio buffers from upstream. The NvDsBatchMeta structure must already be attached to the Gst Buffers. The low-level library (libnvds_infer) operates on transformed audio data with dimension CHW. The Gst-nvinferaudio plugin performs transform (log mel spectogram), on the input frame based on audio-transform property setting and transformed audio data is passed to the TensorRT engine for inferencing. The output type generated by the low-level library depends on the network type.

Gst-nvinferaudio currently works on the following type of networks:

  • Encoder Decoder RNN Architecture

  • CNN

Detailed documentation of the TensorRT interface is available at: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html

Gst-nvinferaudio

Downstream components receive a Gst Buffer with unmodified contents plus the metadata created from the inference output of the Gst-nvinferaudio plugin.

Inputs and Outputs

This section summarizes the inputs, outputs, and communication facilities of the Gst-nvinferaudio plugin.


  • Control parameters

Gst-nvinferaudio gets control parameters from a configuration file. You can specify this by setting the property config-file-path. For details, see Gst-nvinferaudio File Configuration Specifications. Other control parameters that can be set through GObject properties are:

  • Batch size

  • Inference interval

  • Attach inference tensor outputs as buffer metadata

  • Attach instance mask output as in object metadata

  • The parameters set through the GObject properties override the parameters in the Gst-nvinferaudio configuration file.

  • Outputs

    • Gst Buffer

    • NvDsFrameMeta

    • NvDsClassifierMeta

Features

The following table summarizes the features of the plugin.

Gst-nvinferaudio plugin features

Feature

Description

Release

nvinferaudio plugin for audio inference

Introducing nvinferaudio

DeepStream 5.1.0

The Gst-nvinferaudio configuration file uses a “Key File” format described in https://specifications.freedesktop.org/desktop-entry-spec/latest. The [property] group configures the general behavior of the plugin. It is the only mandatory group. The [class-attrs-all] group configures detection parameters for all classes. The [class-attrs-<class-id>] group configures detection parameters for a class specified by <class-id>. For example, the [class-attrs-23] group configures detection parameters for class ID 23. This type of group has the same keys as [class-attrs-all]. The following two tables respectively describe the keys supported for [property] groups and [class-attrs-…] groups.

Gst-nvinferaudio Property Group Supported Keys

Property

Meaning

Type and Range

Example Notes

Network Types
/ Applicable to GIEs
(Primary/­Secondary)

num-detected-classes

Number of classes detected by the network

Integer, >0

num-detected-classes=­91

Detector Both

net-scale-factor

Pixel normalization factor

Float, >0.0

net-scale-factor=­0.031

All Both

model-file

Pathname of the caffemodel file. Not required if model-engine-file is used

String

model-file=­/home/­ubuntu/­model.caffemodel

All Both

proto-file

Pathname of the prototxt file. Not required if model-engine-file is used

String

proto-file=­/home/­ubuntu/­model.prototxt

All Both

int8-calib-file

Pathname of the INT8 calibration file for dynamic range adjustment with an FP32 model

String

int8-calib-file=­/home/­ubuntu/­int8_calib

All Both

batch-size

Number of frames or objects to be inferred together in a batch

Integer, >0

batch-size=30

All Both

model-engine-file

Pathname of the serialized model engine file

String

model-engine-file=­/home/­ubuntu/­model.engine

All Both

onnx-file

Pathname of the ONNX model file

String

onnx-file=­/home/­ubuntu/­model.onnx

All Both

labelfile-path

Pathname of a text file containing the labels for the model

String

labelfile-path=­/home/­ubuntu/­model_labels.txt

Detector & classifier Both

output-blob-names

Array of output layer names

Semicolon delimited string array

For detector: output-blob-names=coverage;bbox

For multi-label classifiers: output-blob-names = coverage_attrib1;coverage_attrib2

All Both

custom-lib-path

Absolute pathname of a library containing custom method implementations for custom models

String

custom-lib-path=­/home/­ubuntu/­libresnet_custom_impl.so

All Both

classifier-threshold

Minimum threshold label probability. The GIE outputs the label having the highest probability if it is greater than this threshold

Float, ≥0

classifier-threshold=0.4

Classifier Both

output-tensor-meta

Gst-nvinfer attaches raw tensor output as Gst Buffer metadata.

Boolean

output-tensor-meta=1

All Both

network-type

Type of network

Integer 0: Detector 1: Classifier 2: Segmentation 3: Instance Segmentation

network-type=1

All Both

parse-classifier-func-name

Name of the custom classifier output parsing function. If not specified, Gst-nvinfer uses the internal parsing function for softmax layers.

String

parse-classifier-func-name=­parse_bbox_softmax

Classifier Both

custom-network-config

Pathname of the configuration file for custom networks available in the custom interface for creating CUDA engines.

String

custom-network-config=­/home/­ubuntu/­network.config

All Both

workspace-size

Workspace size to be used by the engine, in MB

Integer, >0

workspace-size=45

All Both

force-implicit-batch-dim

When a network supports both implicit batch dimension and full dimension, force the implicit batch dimension mode.

Boolean

force-implicit-batch-dim=1

All Both

infer-dims

Binding dimensions to set on the image input layer.

channel;

infer-dims=3;224;224

All Both

uff-input-order

UFF input layer order

Integer 0: NCHW 1: NHWC 2: NC

uff-input-order=1

All Both

engine-create-func-name

Name of the custom TensorRT CudaEngine creation function. Refer to the “Custom Model Implementation Interface” section for details

String

engine-create-func-name=NvDsInferYoloCudaEngineGet

All Both

output-io-formats

Specifies the data type and order for bound output layers. For layers not specified, defaults to FP32 and CHW

Semi-colon separated list of format. <output-layer1-name>:<data-type>:<order>;<output-layer2-name>:<data-type>:<order>

data-type should be one of [fp32, fp16, int32, int8]

order should be one of [chw, chw2, chw4, hwc8, chw16, chw32]

output-io-formats=conv2d_bbox:fp32:chw;conv2d_cov/Sigmoid:fp32:chw

All Both

Layer-device-precision

Specifies the device type and precision for any layer in the network

Semi-colon separated list of format. <layer1-name>:<precision>:<device-type>;<layer2-name>:<precision>:<device-type>;

precision should be one of [fp32, fp16, int8]

Device-type should be one of [gpu, dla]

layer-device-precision= output_cov/Sigmoid:fp32:gpu;output_bbox/BiasAdd:fp32:gpu;

All Both

Gst-nvinferaudio Class-attributes Group Supported Keys

Name

Description

Type and Range

Example Notes

(Primary/­Secondary)

threshold

Detection threshold

Float, ≥0

threshold=0.5

Object detector Both

Gst Properties

The values set through Gst properties override the values of properties in the configuration file. The application does this for certain properties that it needs to set programmatically. The following table describes the Gst-nvinferaudio plugin’s Gst properties.

Gst-nvinferaudio Gst Properties

Property

Meaning

Type and Range

Example notes

config-file-path

Absolute pathname of configuration file for the Gst-nvinfer element

String

config-file-path=­config_infer_primary.txt

unique-id

Unique ID identifying metadata generated by this GIE

Integer, | 0 to 4,294,967,295

unique-id=1

infer-on-gie-id

See operate-on-gie-id in the configuration file table

Integer, 0 to 4,294,967,295

infer-on-gie-id=1

operate-on-class-ids

See operate-on-class-ids in the configuration file table

An array of colon- separated integers (class-ids)

operate-on-class-ids=1:2:4

filter-out-class-ids

See filter-out-class-ids in the configuration file table

Semicolon delimited integer array

filter-out-class-ids=1;2

model-engine-file

Absolute pathname of the pre-generated serialized engine file for the mode

String

model-engine-file=­model_b1_fp32.engine

batch-size

Number of audio frames to be inferred together in a batch

Integer, 1 – 4,294,967,295

batch-size=4

Interval

Number of consecutive batches to be skipped for inference

Integer, 0 to 32

interval=0

gpu-id

Device ID of GPU to use for pre-processing/inference (dGPU only)

Integer, 0-4,294,967,295

gpu-id=1

raw-output-file-write

Pathname of raw inference output file

Boolean

raw-output-file-write=1

raw-output-generated-callback

Pointer to the raw output generated callback function

Pointer

Cannot be set through gst-launch

raw-output-generated-userdata

Pointer to user data to be supplied with raw-output-generated-callback

Pointer

Cannot be set through gst-launch

output-tensor-meta

Indicates whether to attach tensor outputs as meta on GstBuffer.

Boolean

output-tensor-meta=0

output-instance-mask

Gst-nvinfer attaches instance mask output in object metadata.

Boolean

output-instance-mask=1

audio-transform

Transform name and parameters

Boxed pointer of type “GstStructure”

audio-transform= melsdb,fft_length=2560,hop_size=692,dsp_window=hann, num_mels=128,sample_rate=44100,p2db_ref=(float)1.0, p2db_min_power=(float)0.0,p2db_top_db=(float)80.0

audio-framesize

Frame size to use for transform

Unsigned Integer. Range: 0 - 4294967295

audio-framesize=441000

audio-hopsize

Hop size to use for transform

Unsigned Integer. Range: 0 - 4294967295

audio-hopsize=110250

Gst-nvinferaudio audio-transform parameters structure

Audio Transform parameters

Description

audio-transform

Options: melsdb

fft_length

FFT length (unsigned int)

hop_size

Hop size (unsigned int)

num_frequencies

Number of frequency bins for specified FFT length (unsigned int)

dsp_window

DSP Window type (char*) Options: none hann hamming

num_mels

Number of mel bins (unsigned int)

sample_rate

Sample rate (unsigned int)

p2db_ref

DSP parameters for power_to_db Reference (float)

p2db_min_power

DSP parameters for power_to_db Min power (float)

p2db_top_db

DSP parameters for power_to_db Top DB (float)

attach-sys-ts

Support pending and shall be added with future releases.