DriveWorks SDK Reference 3.5.78 Release For Test and Development only
DNN
Note
SW Release Applicability: This module is available in both NVIDIA DriveWorks and NVIDIA DRIVE Software releases.

The DNN module implements functionality to run inference using deep neural networks, which were generated with an NVIDIA® TensorRT™ optimization tool.

### Initialization with TensorRT

There are two ways of initializing DNN module with TensorRT.

• Use the following function to provide the path to a serialized TensorRT model file generated with TensorRT_optimization tool:
dwDNNHandle_t *network,
const char *modelFilename,
const dwDNNPluginConfiguration *pluginConfiguration,
• Use the following function to provide a pointer to the memory block where the serialized TensorRT model is stored.
dwDNNHandle_t *network,
const char *modelContent,
uint32_t modelContentSize,
const dwDNNPluginConfiguration *pluginConfiguration,

With TensorRT networks, it is possible to have custom layers. These custom layers in DriveWorks require a certain set of functions to be defined in order to be loaded and executed.

The definition of these functions must be provided in the form of a shared library. For more information on the function to be implemented, please see dw/dnn/plugin/DNNPlugin.h. For an example of plugins, please see sample_dnn_plugin.

### Inference

dwDNN module offers two functions for running inference.

DNN models usually have one input and one output. For these kinds of models, the following function can be used for simplicity:

This function expects a pointer to linear device memory where the output of inference is stored, a pointer to linear device memory where the input to DNN is stored and the corresponding dwDNN handle which contains the network to run. Please note that output must be pre-allocated with the correct dimensions based on the neural network model.

Input to DNN is expected to have NxCxHxW layout, where N stands for batches, C for channels, H for height and W for width.

Moreover, dwDNN module provides a more generic function, with which it is possible to run networks with multiple inputs and/or multiple outputs:

dwStatus dwDNN_infer(float32_t **d_output, float32_t **d_input, dwDNNHandle_t network);

This function expects an array of pointers to linear device memory blocks where the outputs of inference is stored, an array of pointers where the inputs of inference are stored and the corresponding dwDNN handle which contains the network to run.

In order to be sure that the inputs and outputs are given in the correct order, it is recommended to place the input and output data in their corresponding arrays at the indices based on the names of the blobs as defined in network description. The following functions return these indices:

dwStatus dwDNN_getInputIndex(uint32_t *blobIndex,
const char *blobName,
dwDNNHandle_t network);
dwStatus dwDNN_getOutputIndex(uint32_t *blobIndex,
const char *blobName,
dwDNNHandle_t network);

Furthermore, the following functions return the number of required inputs and outputs:

In addition, dimensions of inputs and outputs are available via:

uint32_t blobIndex,
dwDNNHandle_t network);
uint32_t blobIndex,
dwDNNHandle_t network);

Inference is performed in parallel with the host, making it possible to do useful work while the DNN results are being calculated. The caller must wait for the inference to finish before reading the results.

By default the inference job is launched on the default CUDA stream. The simplest way to wait for the inference to finish is thus to call cudaDeviceSynchronize(), which waits for all pending CUDA computations to finish. For more fine-grained control the user can create a cudaStream_t using the CUDA Runtime API and pass it to the DNN with:

dwStatus dwDNN_setCUDAStream(cudaStream_t stream, dwDNNHandle_t network);

After the CUDA stream is assigned to the DNN all following infer() operations are performed on the given CUDA stream. The user can then use CUDA Runtime API methods such as

cudaError_t cudaStreamSynchronize ( cudaStream_t stream );

or

cudaError_t cudaStreamWaitEvent ( cudaStream_t stream, cudaEvent_t event, unsigned int flags );

to wait for the inference results. For more information about CUDA streams refer to the CUDA Runtime documentation.

Each DNN usually requires a specific pre-processing configuration, and it might, therefore, be necessary to include this information together with the DNN.

DNN Metadata contains pre-processing information relevant to the loaded network. This is not a requirement, but can be provided by the user together with the network by placing a certain json file in the same folder as the network with an additional “.json” extension.

For example, if the network is in path “/home/dwUser/dwApp/data/myDetector.dnn”, DNN module will look for “/home/dwUser/dwApp/data/myDetector.dnn.json” to load DNN Metadata from.

The json file must have the following format:

{
"dataConditionerParams" : {
"meanValue" : [0.0, 0.0, 0.0],
"splitPlanes" : true,
"pixelScaleCoefficient": 1.0,
"ignoreAspectRatio" : false,
"doPerPlaneMeanNormalization" : false
},
"tonemapType" : "none",
"__comment": "tonemapType can be one of {none, agtm}"
}

If the json file in question is not present in the same folder as the network, DNN Metadata is filled with default values. The default parameters would look like this:

{
"dataConditionerParams" : {
"meanValue" : [0.0, 0.0, 0.0],
"splitPlanes" : true,
"pixelScaleCoefficient": 1.0,
"ignoreAspectRatio" : false,
"doPerPlaneMeanNormalization" : false
},
"tonemapType" : "none",
"__comment": "tonemapType can be one of {none, agtm}"
}

Note that whether DNN Metadata is used is a decision in the application level. The Metadata can be acquired by calling: