drive/driveworks-4.0/dwx__DNNPlugins_8md_source.html

# Copyright (c) 2019-2020 NVIDIA CORPORATION. All rights reserved.

@page dwx_dnn_plugins DNN Plugins
@tableofcontents

@section plugin_overview DNN Plugins Overview

The DNN Plugins module enables DNN models that are composed of layers that are not supported by TensorRT to benefit from the efficiency of TensorRT.

Integrating such models to DriveWorks can be divided into the following steps:

-# @ref plugin_implementation "Implement custom layers as dwDNNPlugin and compile as a dynamic library"
-# @ref model_generation "Generate TensorRT model with plugins using tensorRT_optimization tool"
-# @ref model_runtime "Load the model and the corresponding plugins at initialization"

@section plugin_implementation Implementing Custom DNN Layers as Plugins

@note Dynamic libraries representing DNN layers must depend on the same TensorRT, CuDNN and CUDA version as DriveWorks if applicable.

A DNN plugin in DriveWorks requires implementing a set of pre-defined functions. The declarations of these functions are located in `dw/dnn/plugins/DNNPlugin.h`.

These functions have `_dwDNNPluginHandle_t` in common. This is a custom variable that can be constructed at initialization and used or modified during the lifetime of the plugin.

In this tutorial, we are going to implement and integrate a pooling layer (PoolPlugin) for an MNIST UFF model. See \ref dwx_dnn_plugin_sample for the corresponding sample.

In PoolPlugin example, we shall implement a class to store the members that must be available throughout the lifetime of the network model as well as to provide the methods that are required by dwDNNPlugin.
One of these members is CuDNN handle, which we shall use to perform the inference.
This class will be the custom variable `_dwDNNPluginHandle_t` that will be passed to each function.

```{.cpp}
class PoolPlugin
{
    PoolPlugin()                  = default;
    PoolPlugin(const PoolPlugin&) = default;
    virtual ~PoolPlugin() = default;

    int initialize();

    void terminate();

    size_t getWorkspaceSize(int) const;

    void deserializeFromFieldCollections(const char8_t* name,
                                         const dwDNNPluginFieldCollection& fieldCollection);

    void deserializeFromBuffer(const char8_t* name, const void* data, size_t length);

    void deserializeFromWeights(const dwDNNPluginWeights* weights, int32_t numWeights)

    int32_t getNbOutputs() const;

    dwBlobSize getOutputDimensions(int32_t index, const dwBlobSize* inputs, int32_t numInputDims);

    bool supportsFormatCombination(int32_t index, const dwDNNPluginTensorDesc* inOut,
                                   int32_t numInputs, int32_t numOutputs) const;


    void configurePlugin(const dwDNNPluginTensorDesc* inputDescs, int32_t numInputs,
                         const dwDNNPluginTensorDesc* outputDescs, int32_t numOutputs);

    int enqueue(int batchSize, const void* const* inputs, void** outputs, void*, cudaStream_t stream);

    size_t getSerializationSize();

    void serialize(void* buffer);

    dwPrecision getOutputPrecision(int32_t index, const dwPrecision* inputPrecisions, int32_t numInputs) const;

    static const char* getPluginType()
    {
        return "MaxPool";
    }

    static const char* getPluginVersion()
    {
        return "2";
    }

    void setPluginNamespace(const char* libNamespace);

    const char* getPluginNamespace() const;

    dwDNNPluginFieldCollection getFieldCollection();

private:
    size_t type2size(dwPrecision precision);

    template<typename T> void write(char*& buffer, const T& val);

    template<typename T> T read(const char* data, size_t totalLength, const char*& itr);

    dwBlobSize m_inputBlobSize;
    dwBlobSize m_outputBlobSize;
    std::shared_ptr<int8_t[]> m_tensorInputIntermediateHostINT8;
    std::shared_ptr<float32_t[]> m_tensorInputIntermediateHostFP32;
    float32_t* m_tensorInputIntermediateDevice = nullptr;
    std::shared_ptr<float32_t[]> m_tensorOutputIntermediateHostFP32;
    std::shared_ptr<int8_t[]> m_tensorOutputIntermediateHostINT8;
    float32_t* m_tensorOutputIntermediateDevice = nullptr;

    float32_t m_inHostScale{-1.0f};
    float32_t m_outHostScale{-1.0f};

    PoolingParams m_poolingParams;

    dwPrecision m_precision{DW_PRECISION_FP32};
    std::map<dwPrecision, cudnnDataType_t> m_typeMap = {{DW_PRECISION_FP32, CUDNN_DATA_FLOAT},
                                                        {DW_PRECISION_FP16, CUDNN_DATA_HALF},
                                                        {DW_PRECISION_INT8, CUDNN_DATA_INT8}};

    cudnnHandle_t m_cudnn;
    cudnnTensorDescriptor_t m_srcDescriptor, m_dstDescriptor;
    cudnnPoolingDescriptor_t m_poolingDescriptor;

    std::string m_namespace;
    dwDNNPluginFieldCollection m_fieldCollection{};
};
```
## Creation

A plugin will be created via the following function:

```
dwStatus _dwDNNPlugin_create(_dwDNNPluginHandle_t* handle);
```

In this example, we shall construct the PoolPlugin and store it in the `_dwDNNPluginHandle_t`:

```{.cpp}
// Create DNN plugin
dwStatus _dwDNNPlugin_create(_dwDNNPluginHandle_t* handle)
{
    std::unique_ptr<PoolPlugin> plugin(new PoolPlugin());
    *handle = reinterpret_cast<_dwDNNPluginHandle_t>(plugin.release());
    return DW_SUCCESS;
}
```

## Initialization


The actual initialization of the member variables of PoolPlugin is triggered by the following function:

```{.cpp}
dwStatus _dwDNNPlugin_initialize(_dwDNNPluginHandle_t handle);
```

As you can see above, the same plugin handle is passed to the function and the initialize method of this
handle can be called as the following:

```{.cpp}
dwStatus _dwDNNPlugin_initialize(_dwDNNPluginHandle_t handle)
{
    auto plugin = reinterpret_cast<PoolPlugin*>(handle);
    plugin->initialize();
    return DW_SUCCESS;
}
```

In PoolPlugin example, we shall initialize CuDNN, Pooling descriptor and source and destination tensor:

```{.cpp}
int initialize()
{
    CHECK_CUDA_ERROR(cudnnCreate(&m_cudnn));                         // initialize cudnn and cublas
    CHECK_CUDA_ERROR(cudnnCreateTensorDescriptor(&m_srcDescriptor)); // create cudnn tensor descriptors we need for bias addition
    CHECK_CUDA_ERROR(cudnnCreateTensorDescriptor(&m_dstDescriptor));
    CHECK_CUDA_ERROR(cudnnCreatePoolingDescriptor(&m_poolingDescriptor));
    CHECK_CUDA_ERROR(cudnnSetPooling2dDescriptor(m_poolingDescriptor,
                                                 m_poolingParams.poolingMode, CUDNN_NOT_PROPAGATE_NAN,
                                                 m_poolingParams.kernelHeight,
                                                 m_poolingParams.kernelWidth,
                                                 m_poolingParams.paddingY, m_poolingParams.paddingX,
                                                 m_poolingParams.strideY, m_poolingParams.strideX));

    return 0;
}
```

TensorRT requires plugins to provide serialization and deserialization functions. Following APIs
are needed to be implemented in order to meet this requirement:

```{.cpp}
// Serialize plugin to a buffer.
dwStatus _dwDNNPlugin_getSerializationSize(size_t* serializationSize, _dwDNNPluginHandle_t handle);

// Serialize plugin to a buffer.
dwStatus _dwDNNPlugin_serialize(void* buffer, _dwDNNPluginHandle_t handle);

// Deserialize plugin from a buffer.
dwStatus _dwDNNPlugin_deserializeFromBuffer(const char8_t* name, const void* buffer, size_t len,
                                            _dwDNNPluginHandle_t handle);

// Deserialize plugin from field collection.
dwStatus _dwDNNPlugin_deserializeFromFieldCollection(const char8_t* name, const dwDNNPluginFieldCollection* fieldCollection,
                                                     _dwDNNPluginHandle_t handle);

// Deserialize plugin from weights.
dwStatus _dwDNNPlugin_deserializeFromWeights(const dwDNNPluginWeights* weights, int32_t numWeights,
                                             _dwDNNPluginHandle_t handle);
```

An example of serializing to / deserializing from buffer:

```{.cpp}
size_t getSerializationSize()
{
    size_t serializationSize = 0U;
    serializationSize += sizeof(m_poolingParams);
    serializationSize += sizeof(m_inputBlobSize);
    serializationSize += sizeof(m_outputBlobSize);
    serializationSize += sizeof(m_precision);
    if (m_precision == DW_PRECISION_INT8)
    {
        // Scales
        serializationSize += sizeof(float32_t) * 2U;
    }
    return serializationSize;
}

void serialize(void* buffer)
{
    char* d = static_cast<char*>(buffer);

    write(d, m_poolingParams);
    write(d, m_inputBlobSize);
    write(d, m_outputBlobSize);
    write(d, m_precision);
    if (m_precision == DW_PRECISION_INT8)
    {
        write(d, m_inHostScale);
        write(d, m_outHostScale);
    }
}

void deserializeFromBuffer(const char8_t* name, const void* data, size_t length)
{
    const char* dataBegin = reinterpret_cast<const char*>(data);
    const char* dataItr   = dataBegin;
    m_poolingParams       = read<PoolingParams>(dataBegin, length, dataItr);
    m_inputBlobSize       = read<dwBlobSize>(dataBegin, length, dataItr);
    m_outputBlobSize      = read<dwBlobSize>(dataBegin, length, dataItr);
    m_precision           = read<dwPrecision>(dataBegin, length, dataItr);
    if (m_precision == DW_PRECISION_INT8)
    {
        m_inHostScale  = read<float32_t>(dataBegin, length, dataItr);
        m_outHostScale = read<float32_t>(dataBegin, length, dataItr);
    }
}
```

## Destroying

The created plugin shall be destroyed by the following function:

```{.cpp}
dwStatus _dwDNNPlugin_destroy(_dwDNNPluginHandle_t handle);
```

For more information on how to implement plugin, please see \ref dwx_dnn_plugin_sample.


## Plugin Lifecycle

![Plugin Lifecycle](dnn_plugin_lifecycle.png)


@section model_generation Generating TensorRT Model with Plugins

\anchor plugin_json
In PoolPlugin example, we have only one custom layer that requires a plugin.
The path of the plugin is `libdnn_pool_plugin.so`, assuming that it is located in the same folder as @ref dwx_tensorRT_tool.
Since this is a UFF model, the layer names do not need to be specified.
Therefore, we can create a `plugin.json` file as follows:

```
{
    "libdnn_pool_plugin.so" : [""]
}
```

Finally, we can generate the model by:
```
./tensorRT_optimization --modelType=uff --uffFile=mnist_custom_pool.uff --inputBlobs=in --outputBlobs=out --inputDims=1x28x28 --pluginConfig=plugin.json --out=mnist_with_plugin.dnn
```


@section model_runtime Loading TensorRT Model with Plugins

Loading a TensorRT model with plugin generated via @ref dwx_tensorRT_tool requires plugin configuration to be defined at initialization of `::dwDNNHandle_t`.

The plugin configuration structures look like this:
```{.cpp}
typedef struct
{
    const dwDNNCustomLayer* customLayers; /**< Array of custom layers. */
    size_t numCustomLayers;               /**< Number of custom layers */
} dwDNNPluginConfiguration;

typedef struct
{
    const char* pluginLibraryPath; /**< Path to a plugin shared object. Path must be either absolute path or path relative to DW lib folder. */
    const char* layerName;         /**< Name of the custom layer. Required for caffe models. */
} dwDNNCustomLayer;
```

Just like the json file used at model generation (see \ref plugin_json "plugin json"), for each layer, the layer name and path to the plugin that implements the layer must be provided.

In PoolPlugin example:
```{.cpp}
dwDNNPluginConfiguration pluginConf{};
pluginConf.numCustomLayers = 1;
dwDNNCustomLayer customLayer{};

// Assuming libdnn_pool_plugin.so is in the same folder as libdriveworks.so
std::string pluginPath = "libdnn_pool_plugin.so";
customLayer.pluginLibraryPath = pluginPath.c_str();
pluginConf.customLayers = &customLayer;

// Initialize DNN from a TensorRT file with plugin configuration
dwDNNHandle_t dnn;
CHECK_DW_ERROR(dwDNN_initializeTensorRTFromFile(&dnn, "mnist_with_plugin.dnn", &pluginConf, DW_PROCESSOR_TYPE_GPU, sdk));
```

See \ref dwx_dnn_plugin_sample for more details.