The DNN module implements functionality to run inference using deep neural networks, which were generated with an NVIDIA® TensorRT™ optimization tool.
There are two ways of initializing DNN module with TensorRT.
With TensorRT networks, it is possible to have custom layers. These custom layers in DriveWorks require a certain set of functions to be defined in order to be loaded and executed.
The definition of these functions must be provided in the form of a shared library. For more information on the function to be implemented, please see dw/dnn/plugin/DNNPlugin.h
. For an example of plugins, please see sample_dnn_plugin
.
dwDNN module offers two functions for running inference.
DNN models usually have one input and one output. For these kinds of models, the following function can be used for simplicity:
This function expects a pointer to linear device memory where the output of inference is stored, a pointer to linear device memory where the input to DNN is stored and the corresponding dwDNN handle which contains the network to run. Please note that output must be pre-allocated with the correct dimensions based on the neural network model.
Input to DNN is expected to have NxCxHxW layout, where N stands for batches, C for channels, H for height and W for width.
Moreover, dwDNN module provides a more generic function, with which it is possible to run networks with multiple inputs and/or multiple outputs:
This function expects an array of pointers to linear device memory blocks where the outputs of inference is stored, an array of pointers where the inputs of inference are stored and the corresponding dwDNN handle which contains the network to run.
In order to be sure that the inputs and outputs are given in the correct order, it is recommended to place the input and output data in their corresponding arrays at the indices based on the names of the blobs as defined in network description. The following functions return these indices:
Furthermore, the following functions return the number of required inputs and outputs:
In addition, dimensions of inputs and outputs are available via:
Inference is performed in parallel with the host, making it possible to do useful work while the DNN results are being calculated. The caller must wait for the inference to finish before reading the results.
By default the inference job is launched on the default CUDA stream. The simplest way to wait for the inference to finish is thus to call cudaDeviceSynchronize()
, which waits for all pending CUDA computations to finish. For more fine-grained control the user can create a cudaStream_t using the CUDA Runtime API and pass it to the DNN with:
After the CUDA stream is assigned to the DNN all following infer() operations are performed on the given CUDA stream. The user can then use CUDA Runtime API methods such as
or
to wait for the inference results. For more information about CUDA streams refer to the CUDA Runtime documentation.
Each DNN usually requires a specific pre-processing configuration, and it might, therefore, be necessary to include this information together with the DNN.
DNN Metadata contains pre-processing information relevant to the loaded network. This is not a requirement, but can be provided by the user together with the network by placing a certain json file in the same folder as the network with an additional “.json” extension.
For example, if the network is in path “/home/dwUser/dwApp/data/myDetector.dnn”, DNN module will look for “/home/dwUser/dwApp/data/myDetector.dnn.json” to load DNN Metadata from.
The json file must have the following format:
If the json file in question is not present in the same folder as the network, DNN Metadata is filled with default values. The default parameters would look like this:
Note that whether DNN Metadata is used is a decision in the application level. The Metadata can be acquired by calling: