DriveWorks SDK Reference
3.0.4260 Release
For Test and Development only

/dvs/git/dirty/gitlab-master_av/dw/sdk/doc/tutorials/dnnPlugins/dwx_DNNPlugins.md
Go to the documentation of this file.
1 # Copyright (c) 2019-2020 NVIDIA CORPORATION. All rights reserved.
2 
3 @page dwx_dnn_plugins DNN Plugins
4 @tableofcontents
5 @note SW Release Applicability: This tutorial is applicable to modules in both **NVIDIA DriveWorks** and **NVIDIA DRIVE Software** releases.
6 
7 @section plugin_overview DNN Plugins Overview
8 
9 The DNN Plugins module enables DNN models that are composed of layers that are not supported by TensorRT to benefit from the efficiency of TensorRT.
10 
11 Integrating such models to DriveWorks can be divided into the following steps:
12 
13 -# @ref plugin_implementation "Implement custom layers as dwDNNPlugin and compile as a dynamic library"
14 -# @ref model_generation "Generate TensorRT model with plugins using tensorRT_optimization tool"
15 -# @ref model_runtime "Load the model and the corresponding plugins at initialization"
16 
17 @section plugin_implementation Implementing Custom DNN Layers as Plugins
18 
19 @note Dynamic libraries representing DNN layers must depend on the same TensorRT, CuDNN and CUDA version as DriveWorks if applicable.
20 
21 A DNN plugin in DriveWorks requires implementing a set of pre-defined functions. The declarations of these functions are located in `dw/dnn/plugins/DNNPlugin.h`.
22 
23 These functions have `_dwDNNPluginHandle_t` in common. This is a custom variable that can be constructed at initialization and used or modified during the lifetime of the plugin.
24 
25 In this tutorial, we are going to implement and integrate a fully-connected layer (FCPlugin) for an MNIST network model. See \ref dwx_dnn_plugin_sample for the corresponding sample.
26 
27 \anchor ip2_description The fully-connected layer in the MNIST network model looks like this:
28 
29 ```
30 layer {
31  name: "ip2"
32  type: "InnerProduct"
33  bottom: "ip1"
34  top: "ip2"
35  param {
36  lr_mult: 1.0
37  }
38  param {
39  lr_mult: 2.0
40  }
41  inner_product_param {
42  num_output: 10
43  weight_filler {
44  type: "xavier"
45  }
46  bias_filler {
47  type: "constant"
48  }
49  }
50 }
51 ```
52 
53 In FCPlugin example, we shall implement a class to store the members that must be available throughout the lifetime of the network model as well as to provide the methods that are required by dwDNNPlugin. Two of these members are cuBLAS and CuDNN handles, which we shall use to perform the inference.
54 This class will be the custom variable that will be passed to each function.
55 
56 ```{.cpp}
57 class FCPlugin
58 {
59  FCPlugin(const dwDNNPluginWeights *weights, int32_t numWeights,
60  int32_t numOutputChannels);
61 
62  FCPlugin(const void* data, size_t length);
63 
64  ~FCPlugin();
65 
66  int initialize();
67  void terminate();
68 
69  int32_t getNbOutputs() const;
70 
71  dwBlobSize getOutputDimensions(int32_t index, const dwBlobSize* inputs, int32_t numInputDims);
72 
73  bool supportsFormat(dwPrecision precision, dwDNNPluginLayout layout);
74 
75  void configureWithFormat(dwPrecision precision, dwDNNPluginLayout layout);
76 
77  size_t getWorkspaceSize(int32_t) const;
78 
79  int32_t enqueue(int32_t batchSize, const void* const* inputs, void** outputs, void*, cudaStream_t stream);
80 
81  size_t getSerializationSize();
82 
83  void serialize(void* buffer);
84 
85 private:
86  size_t type2size(dwPrecision precision);
87 
88  template<typename T> void write(char*& buffer, const T& val);
89 
90  template<typename T> void read(const char*& buffer, T& val);
91 
92  void* copyToDevice(const void* data, size_t count);
93 
94  void convertAndCopyToDevice(void*& deviceWeights, const dwDNNPluginWeights& weights);
95 
96  void convertAndCopyToBuffer(char*& buffer, const dwDNNPluginWeights& weights);
97 
98  void deserializeToDevice(const char*& hostBuffer, void*& deviceWeights, size_t size);
99 
100  // Number of input and output channels of FC layer
101  int32_t m_numOutputChannels, m_numInputChannels;
102  // Kernel and bias weights of FC layer
103  dwDNNPluginWeights m_kernelWeights, m_biasWeights;
104  // Precision of the network model
105  dwPrecision m_precision{DW_PRECISION_FP32};
106 
107  // Actual storage on the device for kernel and bias weights
108  void* m_deviceKernel{nullptr};
109  void* m_deviceBias{nullptr};
110 
111  // CuDNN and cuBLAS handles that are needed for the execution of the layer
112  cudnnHandle_t m_cudnn;
113  cublasHandle_t m_cublas;
114  cudnnTensorDescriptor_t m_srcDescriptor, m_dstDescriptor;
115 };
116 ```
117 
118 ## Initialization
119 
120 A plugin can be initialized either directly from a structure `dwDNNPluginWeights`:
121 ```{.cpp}
122 dwStatus _dwDNNPlugin_initializeFromWeights(_dwDNNPluginHandle_t* handle, const char* layerName,
123  const dwDNNPluginWeights* weights, int32_t numWeights);
124 ```
125 
126 or it can be initialized from serialized data:
127 ```{.cpp}
128 dwStatus _dwDNNPlugin_initialize(_dwDNNPluginHandle_t* handle, const char* layerName,
129  const void* data, size_t length);
130 ```
131 
132 Both of these functions shall define how the plugin should be initialized from given input data.
133 
134 In FCPlugin example, we shall implement the backend for both these initialization functions as constructors of the class and implement both of the initialization functions mentioned above to create an instance of FCPlugin class. Additionally, we shall define helper methods to be able to deserialize layer from data.
135 
136 The first constructor implements initializing the plugin class from `dwDNNPluginWeight`.
137 ```{.cpp}
138 // Construction from dwDNNPluginWeights
139 FCPlugin::FCPlugin(const dwDNNPluginWeights *weights, int32_t numWeights, int32_t numOutputChannels)
140  : m_numOutputChannels(numOutputChannels)
141 {
142  m_kernelWeights = weights[0];
143  m_biasWeights = weights[1];
144  m_kernelWeights.values = malloc(m_kernelWeights.count * type2size(m_kernelWeights.precision));
145 
146  memcpy(const_cast<void*>(m_kernelWeights.values), weights[0].values, m_kernelWeights.count * type2size(m_kernelWeights.precision));
147  m_biasWeights.values = malloc(m_biasWeights.count*type2size(m_biasWeights.precision));
148  memcpy(const_cast<void*>(m_biasWeights.values), weights[1].values, m_biasWeights.count*type2size(m_biasWeights.precision));
149 
150  m_numInputChannels = int(weights[0].count / numOutputChannels);
151 
152  initialize();
153 }
154 ```
155 This constructor allocates kernel and bias weights, and copies them from the given input. `numOutputChannels` is a configurable parameter of the layer which is defined in \ref ip2_description "the model's description".
156 
157 The second constructor implements initializing the plugin class from binary data.
158 ```{.cpp}
159 // Construction from data
160 FCPlugin::FCPlugin(const void* data, size_t length)
161 {
162  const char* d = static_cast<const char*>(data);
163  read(d, m_numInputChannels);
164  read(d, m_numOutputChannels);
165 
166  m_kernelWeights.count = m_numInputChannels * m_numOutputChannels;
167  m_kernelWeights.values = nullptr;
168 
169  read(d, m_biasWeights.count);
170  m_biasWeights.values = nullptr;
171 
172  read(d, m_precision);
173 
174  deserializeToDevice(d, m_deviceKernel, m_kernelWeights.count*type2size(m_precision));
175  deserializeToDevice(d, m_deviceBias, m_biasWeights.count*type2size(m_precision));
176  initialize();
177 }
178 ```
179 The deserialization of the data is dependent on the way the data is \ref serialization "serialized". Firstly, this constructor deserializes the dimensions. Then, it allocates member variables accordingly and deserializes the weights.
180 
181 Initialize function is used by both constructors to create CuDNN and cuBLAS handles, and to perform the operations that are common to both constructors.
182 ```{.cpp}
183 // Initialize common members to both constructors
184 void FCPlugin::initialize()
185 {
186  CHECK_CUDA_ERROR(cudnnCreate(&m_cudnn));// initialize cudnn and cublas
187  CHECK_CUDA_ERROR(cublasCreate(&m_cublas));
188  CHECK_CUDA_ERROR(cudnnCreateTensorDescriptor(&m_srcDescriptor));// create cudnn tensor descriptors we need for bias addition
189  CHECK_CUDA_ERROR(cudnnCreateTensorDescriptor(&m_dstDescriptor));
190  if (m_kernelWeights.values)
191  convertAndCopyToDevice(m_deviceKernel, m_kernelWeights);
192  if (m_biasWeights.values)
193  convertAndCopyToDevice(m_deviceBias, m_biasWeights);
194 }
195 ```
196 
197 Deserialization, in this case, is performed by simply copying the binary data from host to device.
198 ```{.cpp}
199 // Deserialization
200 void FCPlugin::deserializeToDevice(const char*& hostBuffer, void*& deviceWeights, size_t size)
201 {
202  deviceWeights = copyToDevice(hostBuffer, size);
203  hostBuffer += size;
204 }
205 ```
206 
207 The following function facilitates allocating data on device and copying data to device.
208 ```{.cpp}
209 // Allocate and copy given data to device memory
210 void* FCPlugin::copyToDevice(const void* data, size_t count)
211 {
212  void* deviceData;
213  CHECK_CUDA_ERROR(cudaMalloc(&deviceData, count));
214  CHECK_CUDA_ERROR(cudaMemcpy(deviceData, data, count, cudaMemcpyHostToDevice));
215  return deviceData;
216 }
217 ```
218 
219 The following function performs precision conversion if necessary before copying data to device.
220 ```{.cpp}
221 // Convert weights based on loaded precision and copy to device
222 void FCPlugin::convertAndCopyToDevice(void*& deviceWeights, const dwDNNPluginWeights& weights)
223 {
224  if (weights.precision != m_precision) // Weights are converted in host memory first, if the type does not match
225  {
226  size_t size = weights.count* type2size(m_precision);
227  void* buffer = malloc(size);
228  for (int64_t v = 0; v < weights.count; ++v)
229  if (m_precision == DW_PRECISION_FP32)
230  static_cast<float32_t*>(buffer)[v] = __half2float(static_cast<const __half*>(weights.values)[v]);
231  else
232  static_cast<__half*>(buffer)[v] = __float2half(static_cast<const float32_t*>(weights.values)[v]);
233 
234  deviceWeights = copyToDevice(buffer, size);
235  free(buffer);
236  }
237  else
238  deviceWeights = copyToDevice(weights.values, weights.count * type2size(m_precision));
239 }
240 ```
241 
242 The initialization functions required by dwDNNPlugin can now be implemented as follows:
243 ```{.cpp}
244 dwStatus _dwDNNPlugin_initializeFromWeights(_dwDNNPluginHandle_t* handle, const char*,
245  const dwDNNPluginWeights* weights, int32_t numWeights)
246 {
247  std::unique_ptr<FCPlugin> fcPlugin(new FCPlugin(weights, numWeights, 10));
248  *handle = reinterpret_cast<_dwDNNPluginHandle_t>(fcPlugin.release());
249  return DW_SUCCESS;
250 }
251 
252 dwStatus _dwDNNPlugin_initialize(_dwDNNPluginHandle_t* handle, const char*,
253  const void* data, size_t length)
254 {
255  std::unique_ptr<FCPlugin> fcPlugin(new FCPlugin(data, length));
256  *handle = reinterpret_cast<_dwDNNPluginHandle_t>(fcPlugin.release());
257  return DW_SUCCESS;
258 }
259 ```
260 
261 ## Releasing
262 
263 The release function must destroy everything that was created by the plugin at initialization.
264 ```{.cpp}
265 dwStatus _dwDNNPlugin_release(_dwDNNPluginHandle_t handle);
266 ```
267 
268 In FCPlugin example, we are going to destroy everything in the destructor.
269 
270 ```{.cpp}
271 FCPlugin::~FCPlugin()
272 {
273  if (m_kernelWeights.values)
274  {
275  free(const_cast<void*>(m_kernelWeights.values));
276  m_kernelWeights.values = nullptr;
277  }
278  if (m_biasWeights.values)
279  {
280  free(const_cast<void*>(m_biasWeights.values));
281  m_biasWeights.values = nullptr;
282  }
283 
284  terminate();
285 }
286 
287 void FCPlugin::terminate()
288 {
289  CHECK_CUDA_ERROR(cublasDestroy(m_cublas));
290  CHECK_CUDA_ERROR(cudnnDestroy(m_cudnn));
291  if (m_deviceKernel)
292  {
293  cudaFree(m_deviceKernel);
294  m_deviceKernel = nullptr;
295  }
296  if (m_deviceBias)
297  {
298  cudaFree(m_deviceBias);
299  m_deviceBias = nullptr;
300  }
301 }
302 
303 dwStatus _dwDNNPlugin_release(_dwDNNPluginHandle_t handle)
304 {
305  FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
306  delete plugin;
307  return DW_SUCCESS;
308 }
309 ```
310 
311 ## \anchor serialization Serialization
312 
313 A plugin must provide functions to enable serialization from a buffer.
314 
315 ```{.cpp}
316 dwStatus _dwDNNPlugin_getSerializationSize(size_t *serializationSize, _dwDNNPluginHandle_t handle);
317 
318 dwStatus _dwDNNPlugin_serialize(void* buffer, _dwDNNPluginHandle_t handle);
319 ```
320 
321 In FCPlugin example, we store the number of input channels, the number of output channels, the number of bias weights and precision. In addition,
322 we store the kernel and bias weights. Therefore:
323 
324 ```{.cpp}
325 size_t FCPlugin::getSerializationSize()
326 {
327  return sizeof(m_numInputChannels) + sizeof(m_numOutputChannels) + sizeof(m_biasWeights.count) + sizeof(m_precision) +
328  (m_kernelWeights.count + m_biasWeights.count) * type2size(m_precision);
329 }
330 
331 void FCPlugin::serialize(void* buffer)
332 {
333  char* d = static_cast<char*>(buffer);
334 
335  write(d, m_numInputChannels);
336  write(d, m_numOutputChannels);
337  write(d, m_biasWeights.count);
338  write(d, m_precision);
339  convertAndCopyToBuffer(d, m_kernelWeights);
340  convertAndCopyToBuffer(d, m_biasWeights);
341 }
342 
343 void FCPlugin::convertAndCopyToBuffer(char*& buffer, const dwDNNPluginWeights& weights)
344 {
345  if (weights.precision != m_precision)
346  for (int64_t v = 0; v < weights.count; ++v)
347  if (m_precision == DW_PRECISION_FP32)
348  reinterpret_cast<float32_t*>(buffer)[v] = __half2float(static_cast<const __half*>(weights.values)[v]);
349  else
350  reinterpret_cast<__half*>(buffer)[v] = __float2half(static_cast<const float32_t*>(weights.values)[v]);
351  else
352  memcpy(buffer, weights.values, weights.count * type2size(m_precision));
353  buffer += weights.count * type2size(m_precision);
354 }
355 ```
356 
357 The interface functions can be defined as:
358 
359 ```{.cpp}
360 dwStatus _dwDNNPlugin_getSerializationSize(size_t *serializationSize, _dwDNNPluginHandle_t handle)
361 {
362  FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
363  *serializationSize = plugin->getSerializationSize();
364  return DW_SUCCESS;
365 }
366 
367 dwStatus _dwDNNPlugin_serialize(void* buffer, _dwDNNPluginHandle_t handle)
368 {
369  FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
370  plugin->serialize(buffer);
371  return DW_SUCCESS;
372 }
373 ```
374 
375 ## Getter Functions
376 
377 There are multiple getter functions that are required to be implemented by plugin.
378 
379 A plugin must provide a function for returning how many outputs it has:
380 
381 ```{.cpp}
382 dwStatus _dwDNNPlugin_getNumOutputs(int32_t* numOutputs, _dwDNNPluginHandle_t handle);
383 ```
384 
385 In addition, it must provide a function for returning dimensions for each of the outputs based on the input dimensions:
386 
387 ```{.cpp}
388 dwStatus _dwDNNPlugin_getOutputDimensions(dwBlobSize *outputDimensions,
389  int32_t outputIndex, const dwBlobSize* inputDimensions,
390  int32_t numInputs, _dwDNNPluginHandle_t handle);
391 ```
392 
393 Finally, it must provide a function to return temporary workspace size required by this layer.
394 
395 ```{.cpp}
396 dwStatus _dwDNNPlugin_getWorkspaceSize(size_t* workspaceSize, int32_t maxBatchSize,
397  _dwDNNPluginHandle_t handle);
398 ```
399 
400 In FCPlugin example, we know that there is only one output of the layer:
401 
402 ```{.cpp}
403 int32_t FCPlugin::getNbOutputs() const
404 {
405  return 1;
406 }
407 ```
408 
409 The dimensions of the output is expected to be NxCx1x1 where N is the batch size of the input, and C is the number of output channels.
410 ```{.cpp}
411 dwBlobSize FCPlugin::getOutputDimensions(int32_t index, const dwBlobSize* inputs, int32_t numInputDims)
412 {
413  return dwBlobSize{inputs[0].batchsize, m_numOutputChannels, 1, 1};
414 }
415 
416 ```
417 
418 FCPlugin does not need any temporary workspace during the generation of the model.
419 ```{.cpp}
420 size_t FCPlugin::getWorkspaceSize(int32_t) const
421 {
422  return 0;
423 }
424 ```
425 
426 Finally, the interface functions can be defined as:
427 ```{.cpp}
428 dwStatus _dwDNNPlugin_getNumOutputs(int32_t* numOutputs, _dwDNNPluginHandle_t handle)
429 {
430  FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
431  *numOutputs = plugin->getNbOutputs();
432  return DW_SUCCESS;
433 }
434 
435 dwStatus _dwDNNPlugin_getOutputDimensions(dwBlobSize *outputDimensions,
436  int32_t outputIndex, const dwBlobSize* inputDimensions,
437  int32_t numInputs, _dwDNNPluginHandle_t handle)
438 {
439  FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
440  *outputDimensions = plugin->getOutputDimensions(outputIndex, inputDimensions, numInputs);
441  return DW_SUCCESS;
442 }
443 
444 dwStatus _dwDNNPlugin_getWorkspaceSize(size_t* workspaceSize, int32_t maxBatchSize,
445  _dwDNNPluginHandle_t handle)
446 {
447  FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
448  *workspaceSize = plugin->getWorkspaceSize(maxBatchSize);
449  return DW_SUCCESS;
450 }
451 ```
452 
453 ## Formats
454 
455 A plugin must implement a function to indicate whether a given format is supported:
456 
457 ```{.cpp}
458 dwStatus _dwDNNPlugin_supportsFormat(bool* res, dwPrecision precision,
459  dwDNNPluginLayout pluginLayout, _dwDNNPluginHandle_t handle);
460 ```
461 
462 Furthermore, it must allow configuring the format:
463 
464 ```{.cpp}
465 dwStatus _dwDNNPlugin_configureWithFormat(const dwBlobSize* inputDimensions,
466  int32_t numInputs, const dwBlobSize* outputDimensions,
467  int32_t numOutputs, dwPrecision precision,
468  dwDNNPluginLayout layout, int32_t maxBatchSize,
469  _dwDNNPluginHandle_t handle);
470 ```
471 
472 In FCPlugin sample, we support single and half precision and only planar layout:
473 
474 ```{.cpp}
475 bool FCPlugin::supportsFormat(dwPrecision precision, dwDNNPluginLayout layout) const
476 {
477  return (precision == DW_PRECISION_FP32 || precision == DW_PRECISION_FP16) && layout == DW_DNN_PLUGIN_LAYOUT_NCHW;
478 }
479 ```
480 
481 Since there is only one layout supported, precision is the only configurable parameter:
482 
483 ```{.cpp}
484 void FCPlugin::configureWithFormat(dwPrecision precision, dwDNNPluginLayout layout)
485 {
486  m_precision = precision;
487 }
488 ```
489 
490 Hence, the implementation of interface functions look like this:
491 ```{.cpp}
492 dwStatus _dwDNNPlugin_supportsFormat(bool* res, dwPrecision precision,
493  dwDNNPluginLayout pluginLayout, _dwDNNPluginHandle_t handle)
494 {
495  FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
496  *res = plugin->supportsFormat(precision, pluginLayout);
497  return DW_SUCCESS;
498 }
499 
500 dwStatus _dwDNNPlugin_configureWithFormat(const dwBlobSize* inputDimensions,
501  int32_t numInputs, const dwBlobSize* outputDimensions,
502  int32_t numOutputs, dwPrecision precision,
503  dwDNNPluginLayout layout, int32_t maxBatchSize,
504  _dwDNNPluginHandle_t handle)
505 {
506  FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
507  plugin->configureWithFormat(precision, layout);
508  return DW_SUCCESS;
509 }
510 ```
511 
512 ## Inference
513 
514 A plugin must implement a function to perform inference:
515 
516 ```{.cpp}
517 dwStatus _dwDNNPlugin_enqueue(int32_t batchSize, const void* const* inputs, void** outputs,
518  void* workspace, cudaStream_t stream, _dwDNNPluginHandle_t handle)
519 ```
520 
521 In FCPlugin example, we shall use CuDNN and cuBLAS to perform fully-connected layer.
522 We shall use cuBLAS for matrix multiplication and CuDNN for adding bias.
523 
524 ```{.cpp}
525 int32_t FCPlugin::enqueue(int32_t batchSize, const void*const * inputs, void** outputs, void*, cudaStream_t stream)
526 {
527  float32_t onef{1.0f}, zerof{0.0f};
528  __half oneh = __float2half(1.0f), zeroh = __float2half(0.0f);
529 
530  cublasSetStream(m_cublas, stream);
531  cudnnSetStream(m_cudnn, stream);
532 
533  if (m_precision == DW_PRECISION_FP32)
534  {
535  CHECK_CUDA_ERROR(cublasSgemm(m_cublas, CUBLAS_OP_T, CUBLAS_OP_N, m_numOutputChannels, batchSize, m_numInputChannels, &onef,
536  reinterpret_cast<const float32_t*>(m_deviceKernel), m_numInputChannels,
537  reinterpret_cast<const float32_t*>(inputs[0]), m_numInputChannels, &zerof,
538  reinterpret_cast<float32_t*>(outputs[0]), m_numOutputChannels));
539  }
540  else
541  {
542  CHECK_CUDA_ERROR(cublasHgemm(m_cublas, CUBLAS_OP_T, CUBLAS_OP_N, m_numOutputChannels, batchSize, m_numInputChannels, &oneh,
543  reinterpret_cast<const __half*>(m_deviceKernel), m_numInputChannels,
544  reinterpret_cast<const __half*>(inputs[0]), m_numInputChannels, &zeroh,
545  reinterpret_cast<__half*>(outputs[0]), m_numOutputChannels));
546  }
547 
548  if (m_biasWeights.count)
549  {
550  cudnnDataType_t cudnnDT = m_precision == DW_PRECISION_FP32 ? CUDNN_DATA_FLOAT : CUDNN_DATA_HALF;
551  CHECK_CUDA_ERROR(cudnnSetTensor4dDescriptor(m_srcDescriptor, CUDNN_TENSOR_NCHW, cudnnDT, 1, m_numOutputChannels, 1, 1));
552  CHECK_CUDA_ERROR(cudnnSetTensor4dDescriptor(m_dstDescriptor, CUDNN_TENSOR_NCHW, cudnnDT, batchSize, m_numOutputChannels, 1, 1));
553  CHECK_CUDA_ERROR(cudnnAddTensor(m_cudnn, &onef, m_srcDescriptor, m_deviceBias, &onef, m_dstDescriptor, outputs[0]));
554  }
555 
556  return 0;
557 }
558 ```
559 
560 Finally, the implementation of the interface function is:
561 
562 ```{.cpp}
563 dwStatus _dwDNNPlugin_enqueue(int32_t batchSize, const void* const* inputs, void** outputs,
564  void* workspace, cudaStream_t stream, _dwDNNPluginHandle_t handle)
565 {
566  FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
567  plugin->enqueue(batchSize, inputs, outputs, workspace, stream);
568  return DW_SUCCESS;
569 }
570 ```
571 
572 ## Building plugin as a dynamic library
573 
574 The plugin code must be compiled into a dynamic library for each processor architecture that are to be supported.
575 
576 In FCPlugin example, we shall add plugin to the samples to simplify the building process.
577 Firstly, we shall create a folder under `samples` called `fcplugin` and create a `CMakeLists.txt` under this folder:
578 
579 ```
580 project(fc_plugin C CXX)
581 
582 #-------------------------------------------------------------------------------
583 # Project files
584 #-------------------------------------------------------------------------------
585 set(SOURCES
586  FCPlugin.cpp
587 )
588 
589 set(LIBRARIES
590  dw_samples_framework
591  ${DriveWorks_LIBRARIES}
592  cudnn
593 )
594 
595 #-------------------------------------------------------------------------------
596 # Final target
597 #-------------------------------------------------------------------------------
598 cuda_add_library(${PROJECT_NAME} SHARED ${SOURCES})
599 CUDA_ADD_CUBLAS_TO_TARGET(${PROJECT_NAME})
600 target_link_libraries(${PROJECT_NAME} PRIVATE ${LIBRARIES})
601 
602 #-------------------------------------------------------------------------------
603 # Install target
604 #-------------------------------------------------------------------------------
605 sdk_add_sample(${PROJECT_NAME})
606 ```
607 
608 Append `fcplugin` to the `SAMPLES` list in `CMakeLists.txt` under `samples`:
609 ```
610 set(SAMPLES framework;egomotion;sensors;features;rig;visualization;sfm;dnn;laneDetection;colorcorrection;rectifier;ipc;hello_world;image;stereo;freespaceperception;drivenet;maps;template;icp;lidar_accumulator;cameraBlindness;calibration;vehicleio;dataspeedBridge;waitcondition;pointcloudprocessor;fcplugin)
611 ```
612 
613 Finally, run `cmake` and the preferred build system to compile the library. This will generate a `libfc_plugin.so`.
614 
615 @section model_generation Generating TensorRT Model with Plugins
616 
617 In order to execute a DNN model with plugins into DriveWorks, like any other DNN model, it has to be converted to a TensorRT model.
618 This is achieved by running @ref dwx_tensorRT_tool with an extra parameter to configure plugins.
619 
620 Plugin configuration is provided in a json file. This file is expected to provide a path to a plugin as dynamic library for each custom layer:
621 ```
622 {
623  "__comment" : "Please list here all the plugin paths (absolute path or relative to tensorRT_optimization tool) and the layers they apply to.",
624  "plugin1_path" : ["layer1", "layer2", "layer3"],
625  "plugin2_path" : ["layer4", "layer5"],
626  "plugin3_path" : ["layer6"]
627 }
628 ```
629 
630 \anchor plugin_json In FCPlugin example, we have only one custom layer that requires a plugin. The name of the layer is `ip2`, and path of the plugin is `libfc_plugin.so`, assuming that it is located in the same folder as @ref dwx_tensorRT_tool . Therefore, we can create a `plugin.json` file as follows:
631 
632 ```
633 {
634  "libfc_plugin.so" : ["ip2"],
635  "plugin2_path" : ["layer4", "layer5"],
636  "plugin3_path" : ["layer6"]
637 }
638 ```
639 
640 Finally, we can generate the model by:
641 ```
642 ./tensorRT_optimization --modelType=caffe --prototxt=mnist.prototxt --caffemodel=mnist.caffemodel --outputBlobs=prob --pluginConfig=plugin.json --out=mnist_with_plugin.dnn
643 ```
644 
645 @section model_runtime Loading TensorRT Model with Plugins
646 
647 Loading a TensorRT model with plugin generated via @ref dwx_tensorRT_tool requires plugin configuration to be defined at initialization of `::dwDNNHandle_t`.
648 
649 The plugin configuration structures look like this:
650 ```{.cpp}
651 typedef struct
652 {
653  const dwDNNCustomLayer* customLayers; /**< Array of custom layers. */
654  size_t numCustomLayers; /**< Number of custom layers */
655 } dwDNNPluginConfiguration;
656 
657 typedef struct
658 {
659  const char* pluginLibraryPath; /**< Path to a plugin shared object. Path must be either absolute path or path relative to DW lib folder. */
660  const char* layerName; /**< Name of the custom layer. */
661 } dwDNNCustomLayer;
662 ```
663 
664 Just like the json file used at model generation (see \ref plugin_json "plugin json"), for each layer, the layer name and path to the plugin that implements the layer must be provided.
665 
666 In FCPlugin example:
667 ```{.cpp}
668 dwDNNPluginConfiguration pluginConf{};
669 pluginConf.numCustomLayers = 1;
670 dwDNNCustomLayer customLayer{};
671 customLayer.layerName = "ip2";
672 
673 // Assuming libfc_plugin.so is in the same folder as libdriveworks.so
674 std::string pluginPath = "libfc_plugin.so";
675 customLayer.pluginLibraryPath = pluginPath.c_str();
676 pluginConf.customLayers = &customLayer;
677 
678 // Initialize DNN from a TensorRT file with plugin configuration
679 dwDNNHandle_t dnn;
680 CHECK_DW_ERROR(dwDNN_initializeTensorRTFromFile(&dnn, "mnist_with_plugin.dnn", &pluginConf, DW_PROCESSOR_TYPE_GPU, sdk));
681 ```
682 
683 See \ref dwx_dnn_plugin_sample for more details.