DriveWorks SDK Reference
4.0.0 Release
For Test and Development only

doc/tutorials/dnnPlugins/dwx_DNNPlugins.md
Go to the documentation of this file.
1 # Copyright (c) 2019-2020 NVIDIA CORPORATION. All rights reserved.
2 
3 @page dwx_dnn_plugins DNN Plugins
4 @tableofcontents
5 
6 @section plugin_overview DNN Plugins Overview
7 
8 The DNN Plugins module enables DNN models that are composed of layers that are not supported by TensorRT to benefit from the efficiency of TensorRT.
9 
10 Integrating such models to DriveWorks can be divided into the following steps:
11 
12 -# @ref plugin_implementation "Implement custom layers as dwDNNPlugin and compile as a dynamic library"
13 -# @ref model_generation "Generate TensorRT model with plugins using tensorRT_optimization tool"
14 -# @ref model_runtime "Load the model and the corresponding plugins at initialization"
15 
16 @section plugin_implementation Implementing Custom DNN Layers as Plugins
17 
18 @note Dynamic libraries representing DNN layers must depend on the same TensorRT, CuDNN and CUDA version as DriveWorks if applicable.
19 
20 A DNN plugin in DriveWorks requires implementing a set of pre-defined functions. The declarations of these functions are located in `dw/dnn/plugins/DNNPlugin.h`.
21 
22 These functions have `_dwDNNPluginHandle_t` in common. This is a custom variable that can be constructed at initialization and used or modified during the lifetime of the plugin.
23 
24 In this tutorial, we are going to implement and integrate a pooling layer (PoolPlugin) for an MNIST UFF model. See \ref dwx_dnn_plugin_sample for the corresponding sample.
25 
26 In PoolPlugin example, we shall implement a class to store the members that must be available throughout the lifetime of the network model as well as to provide the methods that are required by dwDNNPlugin.
27 One of these members is CuDNN handle, which we shall use to perform the inference.
28 This class will be the custom variable `_dwDNNPluginHandle_t` that will be passed to each function.
29 
30 ```{.cpp}
31 class PoolPlugin
32 {
33  PoolPlugin() = default;
34  PoolPlugin(const PoolPlugin&) = default;
35  virtual ~PoolPlugin() = default;
36 
37  int initialize();
38 
39  void terminate();
40 
41  size_t getWorkspaceSize(int) const;
42 
43  void deserializeFromFieldCollections(const char8_t* name,
44  const dwDNNPluginFieldCollection& fieldCollection);
45 
46  void deserializeFromBuffer(const char8_t* name, const void* data, size_t length);
47 
48  void deserializeFromWeights(const dwDNNPluginWeights* weights, int32_t numWeights)
49 
50  int32_t getNbOutputs() const;
51 
52  dwBlobSize getOutputDimensions(int32_t index, const dwBlobSize* inputs, int32_t numInputDims);
53 
54  bool supportsFormatCombination(int32_t index, const dwDNNPluginTensorDesc* inOut,
55  int32_t numInputs, int32_t numOutputs) const;
56 
57 
58  void configurePlugin(const dwDNNPluginTensorDesc* inputDescs, int32_t numInputs,
59  const dwDNNPluginTensorDesc* outputDescs, int32_t numOutputs);
60 
61  int enqueue(int batchSize, const void* const* inputs, void** outputs, void*, cudaStream_t stream);
62 
63  size_t getSerializationSize();
64 
65  void serialize(void* buffer);
66 
67  dwPrecision getOutputPrecision(int32_t index, const dwPrecision* inputPrecisions, int32_t numInputs) const;
68 
69  static const char* getPluginType()
70  {
71  return "MaxPool";
72  }
73 
74  static const char* getPluginVersion()
75  {
76  return "2";
77  }
78 
79  void setPluginNamespace(const char* libNamespace);
80 
81  const char* getPluginNamespace() const;
82 
83  dwDNNPluginFieldCollection getFieldCollection();
84 
85 private:
86  size_t type2size(dwPrecision precision);
87 
88  template<typename T> void write(char*& buffer, const T& val);
89 
90  template<typename T> T read(const char* data, size_t totalLength, const char*& itr);
91 
92  dwBlobSize m_inputBlobSize;
93  dwBlobSize m_outputBlobSize;
94  std::shared_ptr<int8_t[]> m_tensorInputIntermediateHostINT8;
95  std::shared_ptr<float32_t[]> m_tensorInputIntermediateHostFP32;
96  float32_t* m_tensorInputIntermediateDevice = nullptr;
97  std::shared_ptr<float32_t[]> m_tensorOutputIntermediateHostFP32;
98  std::shared_ptr<int8_t[]> m_tensorOutputIntermediateHostINT8;
99  float32_t* m_tensorOutputIntermediateDevice = nullptr;
100 
101  float32_t m_inHostScale{-1.0f};
102  float32_t m_outHostScale{-1.0f};
103 
104  PoolingParams m_poolingParams;
105 
106  dwPrecision m_precision{DW_PRECISION_FP32};
107  std::map<dwPrecision, cudnnDataType_t> m_typeMap = {{DW_PRECISION_FP32, CUDNN_DATA_FLOAT},
108  {DW_PRECISION_FP16, CUDNN_DATA_HALF},
109  {DW_PRECISION_INT8, CUDNN_DATA_INT8}};
110 
111  cudnnHandle_t m_cudnn;
112  cudnnTensorDescriptor_t m_srcDescriptor, m_dstDescriptor;
113  cudnnPoolingDescriptor_t m_poolingDescriptor;
114 
115  std::string m_namespace;
116  dwDNNPluginFieldCollection m_fieldCollection{};
117 };
118 ```
119 ## Creation
120 
121 A plugin will be created via the following function:
122 
123 ```
124 dwStatus _dwDNNPlugin_create(_dwDNNPluginHandle_t* handle);
125 ```
126 
127 In this example, we shall construct the PoolPlugin and store it in the `_dwDNNPluginHandle_t`:
128 
129 ```{.cpp}
130 // Create DNN plugin
131 dwStatus _dwDNNPlugin_create(_dwDNNPluginHandle_t* handle)
132 {
133  std::unique_ptr<PoolPlugin> plugin(new PoolPlugin());
134  *handle = reinterpret_cast<_dwDNNPluginHandle_t>(plugin.release());
135  return DW_SUCCESS;
136 }
137 ```
138 
139 ## Initialization
140 
141 
142 The actual initialization of the member variables of PoolPlugin is triggered by the following function:
143 
144 ```{.cpp}
145 dwStatus _dwDNNPlugin_initialize(_dwDNNPluginHandle_t handle);
146 ```
147 
148 As you can see above, the same plugin handle is passed to the function and the initialize method of this
149 handle can be called as the following:
150 
151 ```{.cpp}
152 dwStatus _dwDNNPlugin_initialize(_dwDNNPluginHandle_t handle)
153 {
154  auto plugin = reinterpret_cast<PoolPlugin*>(handle);
155  plugin->initialize();
156  return DW_SUCCESS;
157 }
158 ```
159 
160 In PoolPlugin example, we shall initialize CuDNN, Pooling descriptor and source and destination tensor:
161 
162 ```{.cpp}
163 int initialize()
164 {
165  CHECK_CUDA_ERROR(cudnnCreate(&m_cudnn)); // initialize cudnn and cublas
166  CHECK_CUDA_ERROR(cudnnCreateTensorDescriptor(&m_srcDescriptor)); // create cudnn tensor descriptors we need for bias addition
167  CHECK_CUDA_ERROR(cudnnCreateTensorDescriptor(&m_dstDescriptor));
168  CHECK_CUDA_ERROR(cudnnCreatePoolingDescriptor(&m_poolingDescriptor));
169  CHECK_CUDA_ERROR(cudnnSetPooling2dDescriptor(m_poolingDescriptor,
170  m_poolingParams.poolingMode, CUDNN_NOT_PROPAGATE_NAN,
171  m_poolingParams.kernelHeight,
172  m_poolingParams.kernelWidth,
173  m_poolingParams.paddingY, m_poolingParams.paddingX,
174  m_poolingParams.strideY, m_poolingParams.strideX));
175 
176  return 0;
177 }
178 ```
179 
180 TensorRT requires plugins to provide serialization and deserialization functions. Following APIs
181 are needed to be implemented in order to meet this requirement:
182 
183 ```{.cpp}
184 // Serialize plugin to a buffer.
185 dwStatus _dwDNNPlugin_getSerializationSize(size_t* serializationSize, _dwDNNPluginHandle_t handle);
186 
187 // Serialize plugin to a buffer.
188 dwStatus _dwDNNPlugin_serialize(void* buffer, _dwDNNPluginHandle_t handle);
189 
190 // Deserialize plugin from a buffer.
191 dwStatus _dwDNNPlugin_deserializeFromBuffer(const char8_t* name, const void* buffer, size_t len,
192  _dwDNNPluginHandle_t handle);
193 
194 // Deserialize plugin from field collection.
195 dwStatus _dwDNNPlugin_deserializeFromFieldCollection(const char8_t* name, const dwDNNPluginFieldCollection* fieldCollection,
196  _dwDNNPluginHandle_t handle);
197 
198 // Deserialize plugin from weights.
199 dwStatus _dwDNNPlugin_deserializeFromWeights(const dwDNNPluginWeights* weights, int32_t numWeights,
200  _dwDNNPluginHandle_t handle);
201 ```
202 
203 An example of serializing to / deserializing from buffer:
204 
205 ```{.cpp}
206 size_t getSerializationSize()
207 {
208  size_t serializationSize = 0U;
209  serializationSize += sizeof(m_poolingParams);
210  serializationSize += sizeof(m_inputBlobSize);
211  serializationSize += sizeof(m_outputBlobSize);
212  serializationSize += sizeof(m_precision);
213  if (m_precision == DW_PRECISION_INT8)
214  {
215  // Scales
216  serializationSize += sizeof(float32_t) * 2U;
217  }
218  return serializationSize;
219 }
220 
221 void serialize(void* buffer)
222 {
223  char* d = static_cast<char*>(buffer);
224 
225  write(d, m_poolingParams);
226  write(d, m_inputBlobSize);
227  write(d, m_outputBlobSize);
228  write(d, m_precision);
229  if (m_precision == DW_PRECISION_INT8)
230  {
231  write(d, m_inHostScale);
232  write(d, m_outHostScale);
233  }
234 }
235 
236 void deserializeFromBuffer(const char8_t* name, const void* data, size_t length)
237 {
238  const char* dataBegin = reinterpret_cast<const char*>(data);
239  const char* dataItr = dataBegin;
240  m_poolingParams = read<PoolingParams>(dataBegin, length, dataItr);
241  m_inputBlobSize = read<dwBlobSize>(dataBegin, length, dataItr);
242  m_outputBlobSize = read<dwBlobSize>(dataBegin, length, dataItr);
243  m_precision = read<dwPrecision>(dataBegin, length, dataItr);
244  if (m_precision == DW_PRECISION_INT8)
245  {
246  m_inHostScale = read<float32_t>(dataBegin, length, dataItr);
247  m_outHostScale = read<float32_t>(dataBegin, length, dataItr);
248  }
249 }
250 ```
251 
252 ## Destroying
253 
254 The created plugin shall be destroyed by the following function:
255 
256 ```{.cpp}
257 dwStatus _dwDNNPlugin_destroy(_dwDNNPluginHandle_t handle);
258 ```
259 
260 For more information on how to implement plugin, please see \ref dwx_dnn_plugin_sample.
261 
262 
263 ## Plugin Lifecycle
264 
265 ![Plugin Lifecycle](dnn_plugin_lifecycle.png)
266 
267 
268 @section model_generation Generating TensorRT Model with Plugins
269 
270 \anchor plugin_json
271 In PoolPlugin example, we have only one custom layer that requires a plugin.
272 The path of the plugin is `libdnn_pool_plugin.so`, assuming that it is located in the same folder as @ref dwx_tensorRT_tool.
273 Since this is a UFF model, the layer names do not need to be specified.
274 Therefore, we can create a `plugin.json` file as follows:
275 
276 ```
277 {
278  "libdnn_pool_plugin.so" : [""]
279 }
280 ```
281 
282 Finally, we can generate the model by:
283 ```
284 ./tensorRT_optimization --modelType=uff --uffFile=mnist_custom_pool.uff --inputBlobs=in --outputBlobs=out --inputDims=1x28x28 --pluginConfig=plugin.json --out=mnist_with_plugin.dnn
285 ```
286 
287 
288 @section model_runtime Loading TensorRT Model with Plugins
289 
290 Loading a TensorRT model with plugin generated via @ref dwx_tensorRT_tool requires plugin configuration to be defined at initialization of `::dwDNNHandle_t`.
291 
292 The plugin configuration structures look like this:
293 ```{.cpp}
294 typedef struct
295 {
296  const dwDNNCustomLayer* customLayers; /**< Array of custom layers. */
297  size_t numCustomLayers; /**< Number of custom layers */
298 } dwDNNPluginConfiguration;
299 
300 typedef struct
301 {
302  const char* pluginLibraryPath; /**< Path to a plugin shared object. Path must be either absolute path or path relative to DW lib folder. */
303  const char* layerName; /**< Name of the custom layer. Required for caffe models. */
304 } dwDNNCustomLayer;
305 ```
306 
307 Just like the json file used at model generation (see \ref plugin_json "plugin json"), for each layer, the layer name and path to the plugin that implements the layer must be provided.
308 
309 In PoolPlugin example:
310 ```{.cpp}
311 dwDNNPluginConfiguration pluginConf{};
312 pluginConf.numCustomLayers = 1;
313 dwDNNCustomLayer customLayer{};
314 
315 // Assuming libdnn_pool_plugin.so is in the same folder as libdriveworks.so
316 std::string pluginPath = "libdnn_pool_plugin.so";
317 customLayer.pluginLibraryPath = pluginPath.c_str();
318 pluginConf.customLayers = &customLayer;
319 
320 // Initialize DNN from a TensorRT file with plugin configuration
321 dwDNNHandle_t dnn;
322 CHECK_DW_ERROR(dwDNN_initializeTensorRTFromFile(&dnn, "mnist_with_plugin.dnn", &pluginConf, DW_PROCESSOR_TYPE_GPU, sdk));
323 ```
324 
325 See \ref dwx_dnn_plugin_sample for more details.