archive/driveworks-3.0/dwx__DNNPlugins_8md_source.html

# Copyright (c) 2019-2020 NVIDIA CORPORATION. All rights reserved.

@page dwx_dnn_plugins DNN Plugins
@tableofcontents
@note SW Release Applicability: This tutorial is applicable to modules in both **NVIDIA DriveWorks** and **NVIDIA DRIVE Software** releases.

@section plugin_overview DNN Plugins Overview

The DNN Plugins module enables DNN models that are composed of layers that are not supported by TensorRT to benefit from the efficiency of TensorRT.

Integrating such models to DriveWorks can be divided into the following steps:

-# @ref plugin_implementation "Implement custom layers as dwDNNPlugin and compile as a dynamic library"
-# @ref model_generation "Generate TensorRT model with plugins using tensorRT_optimization tool"
-# @ref model_runtime "Load the model and the corresponding plugins at initialization"

@section plugin_implementation Implementing Custom DNN Layers as Plugins

@note Dynamic libraries representing DNN layers must depend on the same TensorRT, CuDNN and CUDA version as DriveWorks if applicable.

A DNN plugin in DriveWorks requires implementing a set of pre-defined functions. The declarations of these functions are located in `dw/dnn/plugins/DNNPlugin.h`.

These functions have `_dwDNNPluginHandle_t` in common. This is a custom variable that can be constructed at initialization and used or modified during the lifetime of the plugin.

In this tutorial, we are going to implement and integrate a fully-connected layer (FCPlugin) for an MNIST network model. See \ref dwx_dnn_plugin_sample for the corresponding sample.

\anchor ip2_description The fully-connected layer in the MNIST network model looks like this:

```
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1.0
  }
  param {
    lr_mult: 2.0
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
```

In FCPlugin example, we shall implement a class to store the members that must be available throughout the lifetime of the network model as well as to provide the methods that are required by dwDNNPlugin. Two of these members are cuBLAS and CuDNN handles, which we shall use to perform the inference.
This class will be the custom variable that will be passed to each function.

```{.cpp}
class FCPlugin
{
   FCPlugin(const dwDNNPluginWeights *weights, int32_t numWeights,
             int32_t numOutputChannels);

    FCPlugin(const void* data, size_t length);

    ~FCPlugin();

    int initialize();
    void terminate();

    int32_t getNbOutputs() const;

    dwBlobSize getOutputDimensions(int32_t index, const dwBlobSize* inputs, int32_t numInputDims);

    bool supportsFormat(dwPrecision precision, dwDNNPluginLayout layout);

    void configureWithFormat(dwPrecision precision, dwDNNPluginLayout layout);

    size_t getWorkspaceSize(int32_t) const;

    int32_t enqueue(int32_t batchSize, const void* const* inputs, void** outputs, void*, cudaStream_t stream);

    size_t getSerializationSize();

    void serialize(void* buffer);

private:
   size_t type2size(dwPrecision precision);

   template<typename T> void write(char*& buffer, const T& val);

    template<typename T> void read(const char*& buffer, T& val);

    void* copyToDevice(const void* data, size_t count);

   void convertAndCopyToDevice(void*& deviceWeights, const dwDNNPluginWeights& weights);

    void convertAndCopyToBuffer(char*& buffer, const dwDNNPluginWeights& weights);

   void deserializeToDevice(const char*& hostBuffer, void*& deviceWeights, size_t size);

   // Number of input and output channels of FC layer
    int32_t m_numOutputChannels, m_numInputChannels;
    // Kernel and bias weights of FC layer
    dwDNNPluginWeights m_kernelWeights, m_biasWeights;
    // Precision of the network model
    dwPrecision m_precision{DW_PRECISION_FP32};

    // Actual storage on the device for kernel and bias weights
    void* m_deviceKernel{nullptr};
    void* m_deviceBias{nullptr};

    // CuDNN and cuBLAS handles that are needed for the execution of the layer
    cudnnHandle_t m_cudnn;
    cublasHandle_t m_cublas;
    cudnnTensorDescriptor_t m_srcDescriptor, m_dstDescriptor;
};
```

## Initialization

A plugin can be initialized either directly from a structure `dwDNNPluginWeights`:
```{.cpp}
dwStatus _dwDNNPlugin_initializeFromWeights(_dwDNNPluginHandle_t* handle, const char* layerName,
                                            const dwDNNPluginWeights* weights, int32_t numWeights);
```

or it can be initialized from serialized data:
```{.cpp}
dwStatus _dwDNNPlugin_initialize(_dwDNNPluginHandle_t* handle, const char* layerName,
                                 const void* data, size_t length);
```

Both of these functions shall define how the plugin should be initialized from given input data.

In FCPlugin example, we shall implement the backend for both these initialization functions as constructors of the class and implement both of the initialization functions mentioned above to create an instance of FCPlugin class. Additionally, we shall define helper methods to be able to deserialize layer from data.

The first constructor implements initializing the plugin class from `dwDNNPluginWeight`.
```{.cpp}
// Construction from dwDNNPluginWeights
FCPlugin::FCPlugin(const dwDNNPluginWeights *weights, int32_t numWeights, int32_t numOutputChannels)
   : m_numOutputChannels(numOutputChannels)
{
   m_kernelWeights = weights[0];
    m_biasWeights = weights[1];
    m_kernelWeights.values = malloc(m_kernelWeights.count * type2size(m_kernelWeights.precision));

    memcpy(const_cast<void*>(m_kernelWeights.values), weights[0].values, m_kernelWeights.count * type2size(m_kernelWeights.precision));
    m_biasWeights.values = malloc(m_biasWeights.count*type2size(m_biasWeights.precision));
    memcpy(const_cast<void*>(m_biasWeights.values), weights[1].values, m_biasWeights.count*type2size(m_biasWeights.precision));

    m_numInputChannels = int(weights[0].count / numOutputChannels);

    initialize();
}
```
This constructor allocates kernel and bias weights, and copies them from the given input. `numOutputChannels` is a configurable parameter of the layer which is defined in \ref ip2_description "the model's description".

The second constructor implements initializing the plugin class from binary data.
```{.cpp}
// Construction from data
FCPlugin::FCPlugin(const void* data, size_t length)
{
   const char* d = static_cast<const char*>(data);
    read(d, m_numInputChannels);
    read(d, m_numOutputChannels);

    m_kernelWeights.count = m_numInputChannels * m_numOutputChannels;
    m_kernelWeights.values = nullptr;

    read(d, m_biasWeights.count);
    m_biasWeights.values = nullptr;

    read(d, m_precision);

    deserializeToDevice(d, m_deviceKernel, m_kernelWeights.count*type2size(m_precision));
    deserializeToDevice(d, m_deviceBias, m_biasWeights.count*type2size(m_precision));
    initialize();
}
```
The deserialization of the data is dependent on the way the data is \ref serialization "serialized". Firstly, this constructor deserializes the dimensions. Then, it allocates member variables accordingly and deserializes the weights.

Initialize function is used by both constructors to create CuDNN and cuBLAS handles, and to perform the operations that are common to both constructors.
```{.cpp}
// Initialize common members to both constructors
void FCPlugin::initialize()
{
   CHECK_CUDA_ERROR(cudnnCreate(&m_cudnn));// initialize cudnn and cublas
    CHECK_CUDA_ERROR(cublasCreate(&m_cublas));
    CHECK_CUDA_ERROR(cudnnCreateTensorDescriptor(&m_srcDescriptor));// create cudnn tensor descriptors we need for bias addition
    CHECK_CUDA_ERROR(cudnnCreateTensorDescriptor(&m_dstDescriptor));
    if (m_kernelWeights.values)
        convertAndCopyToDevice(m_deviceKernel, m_kernelWeights);
    if (m_biasWeights.values)
        convertAndCopyToDevice(m_deviceBias, m_biasWeights);
}
```

Deserialization, in this case, is performed by simply copying the binary data from host to device.
```{.cpp}
// Deserialization
void FCPlugin::deserializeToDevice(const char*& hostBuffer, void*& deviceWeights, size_t size)
{
    deviceWeights = copyToDevice(hostBuffer, size);
    hostBuffer += size;
}
```

The following function facilitates allocating data on device and copying data to device.
```{.cpp}
// Allocate and copy given data to device memory
void* FCPlugin::copyToDevice(const void* data, size_t count)
{
    void* deviceData;
    CHECK_CUDA_ERROR(cudaMalloc(&deviceData, count));
    CHECK_CUDA_ERROR(cudaMemcpy(deviceData, data, count, cudaMemcpyHostToDevice));
    return deviceData;
}
```

The following function performs precision conversion if necessary before copying data to device.
```{.cpp}
// Convert weights based on loaded precision and copy to device
void FCPlugin::convertAndCopyToDevice(void*& deviceWeights, const dwDNNPluginWeights& weights)
{
    if (weights.precision != m_precision) // Weights are converted in host memory first, if the type does not match
    {
        size_t size = weights.count* type2size(m_precision);
        void* buffer = malloc(size);
        for (int64_t v = 0; v < weights.count; ++v)
            if (m_precision == DW_PRECISION_FP32)
                static_cast<float32_t*>(buffer)[v] = __half2float(static_cast<const __half*>(weights.values)[v]);
            else
                static_cast<__half*>(buffer)[v] = __float2half(static_cast<const float32_t*>(weights.values)[v]);

        deviceWeights = copyToDevice(buffer, size);
        free(buffer);
    }
    else
        deviceWeights = copyToDevice(weights.values, weights.count * type2size(m_precision));
}
```

The initialization functions required by dwDNNPlugin can now be implemented as follows:
```{.cpp}
dwStatus _dwDNNPlugin_initializeFromWeights(_dwDNNPluginHandle_t* handle, const char*,
                                            const dwDNNPluginWeights* weights, int32_t numWeights)
{
    std::unique_ptr<FCPlugin> fcPlugin(new FCPlugin(weights, numWeights, 10));
    *handle = reinterpret_cast<_dwDNNPluginHandle_t>(fcPlugin.release());
    return DW_SUCCESS;
}

dwStatus _dwDNNPlugin_initialize(_dwDNNPluginHandle_t* handle, const char*,
                                 const void* data, size_t length)
{
    std::unique_ptr<FCPlugin> fcPlugin(new FCPlugin(data, length));
    *handle = reinterpret_cast<_dwDNNPluginHandle_t>(fcPlugin.release());
    return DW_SUCCESS;
}
```

## Releasing

The release function must destroy everything that was created by the plugin at initialization.
```{.cpp}
dwStatus _dwDNNPlugin_release(_dwDNNPluginHandle_t handle);
```

In FCPlugin example, we are going to destroy everything in the destructor.

```{.cpp}
FCPlugin::~FCPlugin()
{
    if (m_kernelWeights.values)
    {
        free(const_cast<void*>(m_kernelWeights.values));
        m_kernelWeights.values = nullptr;
    }
    if (m_biasWeights.values)
    {
        free(const_cast<void*>(m_biasWeights.values));
        m_biasWeights.values = nullptr;
    }

    terminate();
}

void FCPlugin::terminate()
{
    CHECK_CUDA_ERROR(cublasDestroy(m_cublas));
    CHECK_CUDA_ERROR(cudnnDestroy(m_cudnn));
    if (m_deviceKernel)
    {
        cudaFree(m_deviceKernel);
        m_deviceKernel = nullptr;
    }
    if (m_deviceBias)
    {
        cudaFree(m_deviceBias);
        m_deviceBias = nullptr;
    }
}

dwStatus _dwDNNPlugin_release(_dwDNNPluginHandle_t handle)
{
    FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
    delete plugin;
    return DW_SUCCESS;
}
```

## \anchor serialization Serialization

A plugin must provide functions to enable serialization from a buffer.

```{.cpp}
dwStatus _dwDNNPlugin_getSerializationSize(size_t *serializationSize, _dwDNNPluginHandle_t handle);

dwStatus _dwDNNPlugin_serialize(void* buffer, _dwDNNPluginHandle_t handle);
```

In FCPlugin example, we store the number of input channels, the number of output channels, the number of bias weights and precision. In addition,
we store the kernel and bias weights. Therefore:

```{.cpp}
size_t FCPlugin::getSerializationSize()
{
    return sizeof(m_numInputChannels) + sizeof(m_numOutputChannels) + sizeof(m_biasWeights.count) + sizeof(m_precision) +
           (m_kernelWeights.count + m_biasWeights.count) * type2size(m_precision);
}

void FCPlugin::serialize(void* buffer)
{
    char* d = static_cast<char*>(buffer);

    write(d, m_numInputChannels);
    write(d, m_numOutputChannels);
    write(d, m_biasWeights.count);
    write(d, m_precision);
    convertAndCopyToBuffer(d, m_kernelWeights);
    convertAndCopyToBuffer(d, m_biasWeights);
}

void FCPlugin::convertAndCopyToBuffer(char*& buffer, const dwDNNPluginWeights& weights)
{
    if (weights.precision != m_precision)
        for (int64_t v = 0; v < weights.count; ++v)
            if (m_precision == DW_PRECISION_FP32)
                reinterpret_cast<float32_t*>(buffer)[v] = __half2float(static_cast<const __half*>(weights.values)[v]);
            else
                reinterpret_cast<__half*>(buffer)[v] = __float2half(static_cast<const float32_t*>(weights.values)[v]);
    else
        memcpy(buffer, weights.values, weights.count * type2size(m_precision));
    buffer += weights.count * type2size(m_precision);
}
```

The interface functions can be defined as:

```{.cpp}
dwStatus _dwDNNPlugin_getSerializationSize(size_t *serializationSize, _dwDNNPluginHandle_t handle)
{
    FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
    *serializationSize = plugin->getSerializationSize();
    return DW_SUCCESS;
}

dwStatus _dwDNNPlugin_serialize(void* buffer, _dwDNNPluginHandle_t handle)
{
    FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
    plugin->serialize(buffer);
    return DW_SUCCESS;
}
```

## Getter Functions

There are multiple getter functions that are required to be implemented by plugin.

A plugin must provide a function for returning how many outputs it has:

```{.cpp}
dwStatus _dwDNNPlugin_getNumOutputs(int32_t* numOutputs, _dwDNNPluginHandle_t handle);
```

In addition, it must provide a function for returning dimensions for each of the outputs based on the input dimensions:

```{.cpp}
dwStatus _dwDNNPlugin_getOutputDimensions(dwBlobSize *outputDimensions,
                                          int32_t outputIndex, const dwBlobSize* inputDimensions,
                                          int32_t numInputs, _dwDNNPluginHandle_t handle);
```

Finally, it must provide a function to return temporary workspace size required by this layer.

```{.cpp}
dwStatus _dwDNNPlugin_getWorkspaceSize(size_t* workspaceSize, int32_t maxBatchSize,
                                       _dwDNNPluginHandle_t handle);
```

In FCPlugin example, we know that there is only one output of the layer:

```{.cpp}
int32_t FCPlugin::getNbOutputs() const
{
    return 1;
}
```

The dimensions of the output is expected to be NxCx1x1 where N is the batch size of the input, and C is the number of output channels.
```{.cpp}
dwBlobSize FCPlugin::getOutputDimensions(int32_t index, const dwBlobSize* inputs, int32_t numInputDims)
{
    return dwBlobSize{inputs[0].batchsize, m_numOutputChannels, 1, 1};
}

```

FCPlugin does not need any temporary workspace during the generation of the model.
```{.cpp}
size_t FCPlugin::getWorkspaceSize(int32_t) const
{
    return 0;
}
```

Finally, the interface functions can be defined as:
```{.cpp}
dwStatus _dwDNNPlugin_getNumOutputs(int32_t* numOutputs, _dwDNNPluginHandle_t handle)
{
    FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
    *numOutputs = plugin->getNbOutputs();
    return DW_SUCCESS;
}

dwStatus _dwDNNPlugin_getOutputDimensions(dwBlobSize *outputDimensions,
                                          int32_t outputIndex, const dwBlobSize* inputDimensions,
                                          int32_t numInputs, _dwDNNPluginHandle_t handle)
{
    FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
    *outputDimensions = plugin->getOutputDimensions(outputIndex, inputDimensions, numInputs);
    return DW_SUCCESS;
}

dwStatus _dwDNNPlugin_getWorkspaceSize(size_t* workspaceSize, int32_t maxBatchSize,
                                       _dwDNNPluginHandle_t handle)
{
    FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
    *workspaceSize = plugin->getWorkspaceSize(maxBatchSize);
    return DW_SUCCESS;
}
```

## Formats

A plugin must implement a function to indicate whether a given format is supported:

```{.cpp}
dwStatus _dwDNNPlugin_supportsFormat(bool* res, dwPrecision precision,
                                     dwDNNPluginLayout pluginLayout, _dwDNNPluginHandle_t handle);
```

Furthermore, it must allow configuring the format:

```{.cpp}
dwStatus _dwDNNPlugin_configureWithFormat(const dwBlobSize* inputDimensions,
                                          int32_t numInputs, const dwBlobSize* outputDimensions,
                                          int32_t numOutputs, dwPrecision precision,
                                          dwDNNPluginLayout layout, int32_t maxBatchSize,
                                          _dwDNNPluginHandle_t handle);
```

In FCPlugin sample, we support single and half precision and only planar layout:

```{.cpp}
bool FCPlugin::supportsFormat(dwPrecision precision, dwDNNPluginLayout layout) const
{
    return (precision == DW_PRECISION_FP32 || precision == DW_PRECISION_FP16) && layout == DW_DNN_PLUGIN_LAYOUT_NCHW;
}
```

Since there is only one layout supported, precision is the only configurable parameter:

```{.cpp}
void FCPlugin::configureWithFormat(dwPrecision precision, dwDNNPluginLayout layout)
{
    m_precision = precision;
}
```

Hence, the implementation of interface functions look like this:
```{.cpp}
dwStatus _dwDNNPlugin_supportsFormat(bool* res, dwPrecision precision,
                                     dwDNNPluginLayout pluginLayout, _dwDNNPluginHandle_t handle)
{
    FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
    *res = plugin->supportsFormat(precision, pluginLayout);
    return DW_SUCCESS;
}

dwStatus _dwDNNPlugin_configureWithFormat(const dwBlobSize* inputDimensions,
                                          int32_t numInputs, const dwBlobSize* outputDimensions,
                                          int32_t numOutputs, dwPrecision precision,
                                          dwDNNPluginLayout layout, int32_t maxBatchSize,
                                          _dwDNNPluginHandle_t handle)
{
    FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
    plugin->configureWithFormat(precision, layout);
    return DW_SUCCESS;
}
```

## Inference

A plugin must implement a function to perform inference:

```{.cpp}
dwStatus _dwDNNPlugin_enqueue(int32_t batchSize, const void* const* inputs, void** outputs,
                              void* workspace, cudaStream_t stream, _dwDNNPluginHandle_t handle)
```

In FCPlugin example, we shall use CuDNN and cuBLAS to perform fully-connected layer.
We shall use cuBLAS for matrix multiplication and CuDNN for adding bias.

```{.cpp}
int32_t FCPlugin::enqueue(int32_t batchSize, const void*const * inputs, void** outputs, void*, cudaStream_t stream)
{
    float32_t onef{1.0f}, zerof{0.0f};
    __half oneh = __float2half(1.0f), zeroh = __float2half(0.0f);

    cublasSetStream(m_cublas, stream);
    cudnnSetStream(m_cudnn, stream);

    if (m_precision == DW_PRECISION_FP32)
    {
        CHECK_CUDA_ERROR(cublasSgemm(m_cublas, CUBLAS_OP_T, CUBLAS_OP_N, m_numOutputChannels, batchSize, m_numInputChannels, &onef,
                          reinterpret_cast<const float32_t*>(m_deviceKernel), m_numInputChannels,
                          reinterpret_cast<const float32_t*>(inputs[0]), m_numInputChannels, &zerof,
                          reinterpret_cast<float32_t*>(outputs[0]), m_numOutputChannels));
    }
    else
    {
        CHECK_CUDA_ERROR(cublasHgemm(m_cublas, CUBLAS_OP_T, CUBLAS_OP_N, m_numOutputChannels, batchSize, m_numInputChannels, &oneh,
                          reinterpret_cast<const __half*>(m_deviceKernel), m_numInputChannels,
                          reinterpret_cast<const __half*>(inputs[0]), m_numInputChannels, &zeroh,
                          reinterpret_cast<__half*>(outputs[0]), m_numOutputChannels));
    }

    if (m_biasWeights.count)
    {
        cudnnDataType_t cudnnDT = m_precision == DW_PRECISION_FP32 ? CUDNN_DATA_FLOAT : CUDNN_DATA_HALF;
        CHECK_CUDA_ERROR(cudnnSetTensor4dDescriptor(m_srcDescriptor, CUDNN_TENSOR_NCHW, cudnnDT, 1, m_numOutputChannels, 1, 1));
        CHECK_CUDA_ERROR(cudnnSetTensor4dDescriptor(m_dstDescriptor, CUDNN_TENSOR_NCHW, cudnnDT, batchSize, m_numOutputChannels, 1, 1));
        CHECK_CUDA_ERROR(cudnnAddTensor(m_cudnn, &onef, m_srcDescriptor, m_deviceBias, &onef, m_dstDescriptor, outputs[0]));
    }

    return 0;
}
```

Finally, the implementation of the interface function is:

```{.cpp}
dwStatus _dwDNNPlugin_enqueue(int32_t batchSize, const void* const* inputs, void** outputs,
                              void* workspace, cudaStream_t stream, _dwDNNPluginHandle_t handle)
{
    FCPlugin *plugin = reinterpret_cast<FCPlugin*>(handle);
    plugin->enqueue(batchSize, inputs, outputs, workspace, stream);
    return DW_SUCCESS;
}
```

## Building plugin as a dynamic library

The plugin code must be compiled into a dynamic library for each processor architecture that are to be supported.

In FCPlugin example, we shall add plugin to the samples to simplify the building process.
Firstly, we shall create a folder under `samples` called `fcplugin` and create a `CMakeLists.txt` under this folder:

```
project(fc_plugin C CXX)

#-------------------------------------------------------------------------------
# Project files
#-------------------------------------------------------------------------------
set(SOURCES
    FCPlugin.cpp
)

set(LIBRARIES
    dw_samples_framework
    ${DriveWorks_LIBRARIES}
    cudnn
)

#-------------------------------------------------------------------------------
# Final target
#-------------------------------------------------------------------------------
cuda_add_library(${PROJECT_NAME} SHARED ${SOURCES})
CUDA_ADD_CUBLAS_TO_TARGET(${PROJECT_NAME})
target_link_libraries(${PROJECT_NAME} PRIVATE ${LIBRARIES})

#-------------------------------------------------------------------------------
# Install target
#-------------------------------------------------------------------------------
sdk_add_sample(${PROJECT_NAME})
```

Append `fcplugin` to the `SAMPLES` list in `CMakeLists.txt` under `samples`:
```
set(SAMPLES framework;egomotion;sensors;features;rig;visualization;sfm;dnn;laneDetection;colorcorrection;rectifier;ipc;hello_world;image;stereo;freespaceperception;drivenet;maps;template;icp;lidar_accumulator;cameraBlindness;calibration;vehicleio;dataspeedBridge;waitcondition;pointcloudprocessor;fcplugin)
```

Finally, run `cmake` and the preferred build system to compile the library. This will generate a `libfc_plugin.so`.

@section model_generation Generating TensorRT Model with Plugins

In order to execute a DNN model with plugins into DriveWorks, like any other DNN model, it has to be converted to a TensorRT model.
This is achieved by running @ref dwx_tensorRT_tool with an extra parameter to configure plugins.

Plugin configuration is provided in a json file. This file is expected to provide a path to a plugin as dynamic library for each custom layer:
```
{
    "__comment" : "Please list here all the plugin paths (absolute path or relative to tensorRT_optimization tool) and the layers they apply to.",
    "plugin1_path" : ["layer1", "layer2", "layer3"],
    "plugin2_path" : ["layer4", "layer5"],
    "plugin3_path" : ["layer6"]
}
```

\anchor plugin_json In FCPlugin example, we have only one custom layer that requires a plugin. The name of the layer is `ip2`, and path of the plugin is `libfc_plugin.so`, assuming that it is located in the same folder as @ref dwx_tensorRT_tool . Therefore, we can create a `plugin.json` file as follows:

```
{
    "libfc_plugin.so" : ["ip2"],
    "plugin2_path" : ["layer4", "layer5"],
    "plugin3_path" : ["layer6"]
}
```

Finally, we can generate the model by:
```
./tensorRT_optimization --modelType=caffe --prototxt=mnist.prototxt --caffemodel=mnist.caffemodel --outputBlobs=prob --pluginConfig=plugin.json --out=mnist_with_plugin.dnn
```

@section model_runtime Loading TensorRT Model with Plugins

Loading a TensorRT model with plugin generated via @ref dwx_tensorRT_tool requires plugin configuration to be defined at initialization of `::dwDNNHandle_t`.

The plugin configuration structures look like this:
```{.cpp}
typedef struct
{
    const dwDNNCustomLayer* customLayers; /**< Array of custom layers. */
    size_t numCustomLayers;               /**< Number of custom layers */
} dwDNNPluginConfiguration;

typedef struct
{
    const char* pluginLibraryPath; /**< Path to a plugin shared object. Path must be either absolute path or path relative to DW lib folder. */
    const char* layerName;         /**< Name of the custom layer. */
} dwDNNCustomLayer;
```

Just like the json file used at model generation (see \ref plugin_json "plugin json"), for each layer, the layer name and path to the plugin that implements the layer must be provided.

In FCPlugin example:
```{.cpp}
dwDNNPluginConfiguration pluginConf{};
pluginConf.numCustomLayers = 1;
dwDNNCustomLayer customLayer{};
customLayer.layerName = "ip2";

// Assuming libfc_plugin.so is in the same folder as libdriveworks.so
std::string pluginPath = "libfc_plugin.so";
customLayer.pluginLibraryPath = pluginPath.c_str();
pluginConf.customLayers = &customLayer;

// Initialize DNN from a TensorRT file with plugin configuration
dwDNNHandle_t dnn;
CHECK_DW_ERROR(dwDNN_initializeTensorRTFromFile(&dnn, "mnist_with_plugin.dnn", &pluginConf, DW_PROCESSOR_TYPE_GPU, sdk));
```

See \ref dwx_dnn_plugin_sample for more details.