Extending TensorRT with Custom Layers#
NVIDIA TensorRT supports many layers, and its functionality is continually extended; however, there can be cases in which the layers supported do not cater to a model’s specific needs. In such cases, TensorRT can be extended by implementing custom layers, often called plugins.
TensorRT contains standard plugins that can be loaded into your application. For a list of open-source plugins, refer to GitHub: TensorRT plugins.
To use standard TensorRT plugins in your application, the libnvinfer_plugin.so
(nvinfer_plugin.dll
on Windows) library must be loaded, and all plugins must be registered by calling initLibNvInferPlugins
in your application code. For more information about these plugins, refer to the NvInferPlugin.h file.
You can write and add your own if these plugins do not meet your needs.
Adding Custom Layers Using the C++ API#
There are four steps to ensure that TensorRT properly recognizes your plugin:
Implement a plugin class from one of TensorRT’s plugin base classes. Currently, the only recommended one is
IPluginV3
.Implement a plugin creator class tied to your class by deriving from one of TensorRT’s plugin creator-based classes. Currently, the only recommended one is
IPluginCreatorV3One
.Register an instance of the plugin creator class with TensorRT’s plugin registry.
Add an instance of the plugin class to a TensorRT network by directly using TensorRT’s network APIs or loading an ONNX model using the TensorRT ONNX parser APIs.
The following sections explore each of these steps in detail.
Implementing a Plugin Class#
You can implement a custom layer by deriving from one of TensorRT’s plugin base classes. Starting in TensorRT 10.0, the only plugin interface recommended is IPluginV3
, as others are deprecated. Therefore, this section mostly describes plugin implementation using IPluginV3
. Refer to the Migrating V2 Plugins to IPluginV3 section for how plugins implementing V2 plugin interfaces can be migrated to IPluginV3
.
IPluginV3
is a wrapper for a set of capability interfaces that define three capabilities: core, build, and runtime.
Core capability: Refers to plugin attributes and behaviors common to both the build and runtime phases of a plugin’s lifetime.
Build capability: Refers to plugin attributes and behaviors that the plugin must exhibit for the TensorRT builder.
Runtime capability: Refers to plugin attributes and behaviors that the plugin must exhibit for it to be executable, either during auto-tuning in the TensorRT build phase or inference in the TensorRT runtime phase.
IPluginV3OneCore
(C++, Python), IPluginV3OneBuild
(C++, Python), and IPluginV3OneRuntime
(C++, Python) are the base classes that an IPluginV3
plugin must implement to display the core, build, and runtime capabilities, respectively. If I/O aliasing is required, IPluginV3OneBuildV2
(C++, Python) can be used as the build capability, which contains a superset of the functionalities in IPluginV3OneBuild
.
Implementing a Plugin Creator Class#
To use a plugin in a network, you must first register it with TensorRT’s PluginRegistry
(C++, Python). Rather than registering the plugin directly, you register an instance of a factory class for the plugin, derived from a child class of IPluginCreatorInterface
(C++, Python). The plugin creator class also provides other information about the plugin: its name, version, and plugin field parameters.
IPluginCreatorV3One
is the factory class for IPluginV3
. IPluginCreatorV3One::createPlugin()
, which has the signature below.
1IPluginV3* createPlugin(AsciiChar const *name, PluginFieldCollection const *fc, TensorRTPhase phase)
1create_plugin(self: trt.IPluginCreatorV3, name: str, field_collection: trt.PluginFieldCollection, phase: trt.TensorRTPhase) -> trt.IPluginV3
IPluginCreatorV3One::createPlugin()
may be called to create a plugin instance in either the build phase of TensorRT or the runtime phase of TensorRT, which is communicated by the phase
argument of type TensorRTPhase
(C++, Python).
The returned
IPluginV3
object must have a valid core capability in both phases.In the build phase, the returned
IPluginV3
object must have both a build and runtime capability.In the runtime phase, the returned
IPluginV3
object must have a runtime capability. A build capability is not required and is ignored.
Registering a Plugin Creator with the Plugin Registry#
There are two ways that you can register plugin creators with the registry:
Statically register by calling
REGISTER_TENSORRT_PLUGIN
.REGISTER_TENSORRT_PLUGIN
always registers the creator under the default namespace (“”).Dynamically register by creating an entry point similar to
initLibNvInferPlugins
and callingregisterCreator
on the plugin registry. This is preferred over static registration as it allows plugins to be registered under a unique namespace. This ensures no name collisions during build time across different plugin libraries.
During serialization, the TensorRT engine internally stores the plugin name, plugin version, and namespace (if it exists) for all plugins, along with any plugin fields in the PluginFieldCollection
returned by IPluginV3OneRuntime::getFieldsToSerialize()
. During deserialization, TensorRT looks up a plugin creator with the same plugin name, version, and namespace from the plugin registry and invokes IPluginCreatorV3One:::createPlugin()
on it—the PluginFieldCollection
that was serialized is passed back as the fc
argument.
Adding a Plugin Instance to a TensorRT Network#
You can add a plugin to the TensorRT network using addPluginV3(), which creates a network layer with the given plugin.
For example, you can add a plugin layer to your network as follows:
// Look up the plugin in the registry
// Cast to appropriate child class of IPluginCreatorInterface
auto creator = static_cast<IPluginCreatorV3One*>(getPluginRegistry()->getCreator(pluginName, pluginVersion, pluginNamespace));
PluginFieldCollection const* pluginFC = creator->getFieldNames();
// Populate the field parameters for the plugin layer
// PluginFieldCollection *pluginData = parseAndFillFields(pluginFC, layerFields);
// Create the plugin object using the layerName and the plugin metadata for use by the TensorRT builder
IPluginV3 *pluginObj = creator->createPlugin(layerName, pluginData, TensorRTPhase::kBUILD);
// Add the plugin to the TensorRT network
auto layer = network.addPluginV3(inputs.data(), int(inputs.size()), shapeInputs.data(), int(shapeInputs.size()), pluginObj);
… (build rest of the network and serialize engine)
// Delete the plugin object
delete pluginObj;
… (free allocated pluginData)
The createPlugin
method described previously creates a new plugin object on the heap and returns a pointer. As shown previously, ensure you delete the pluginObj
to avoid a memory leak.
When the engine is deleted, the engine destroys any clones of the plugin object created during the build. You are responsible for ensuring the plugin object you created is freed after it is added to the network.
Note
Do not serialize all plugin parameters, only those required to function correctly at runtime. Build time parameters can be omitted.
If you are an automotive safety user, you must call
getSafePluginRegistry()
instead ofgetPluginRegistry()
. You must also use the macroREGISTER_SAFE_TENSORRT_PLUGIN
instead ofREGISTER_TENSORRT_PLUGIN
.
Example: Adding a Custom Layer with Dynamic Shapes Using C++#
Imagine that a custom layer is needed for a padding-like operation where each image in an input batch must be reshaped to 32 x 32. The input tensor X would be of shape (B, C, H, W), and the output Y would be of shape (B, C, 32, 32). To accomplish this, a TensorRT plugin can be written using the IPluginV3
interface; let us call it PadPlugin
.
Since an IPluginV3
plugin must possess multiple capabilities, each defined by a separate interface, you could implement a plugin using the principle of composition or multiple inheritance. However, a multiple inheritance approach is easier for most use cases, particularly when coupling build and runtime capabilities in a single class is tolerable.
Using multiple inheritance, PadPlugin
can be implemented as follows:
class PadPlugin : public IPluginV3, public IPluginV3OneCore, public IPluginV3OneBuild, public IPluginV3OneRuntime
{
...override inherited virtual methods.
};
The override of IPluginV3::getCapabilityInterface
must return pointers to the individual capability interfaces. For each PluginCapabilityType
, it is imperative to cast through the corresponding capability interface to remove ambiguity for the compiler.
IPluginCapability* PadPlugin::getCapabilityInterface(PluginCapabilityType type) noexcept override
{
// All plugin interface methods are noexcept and care should be
// taken not to throw exceptions across the API boundary. It is
// recommended to catch any exceptions and return a value that
// appropriately represents the error status.
try
{
if (type == PluginCapabilityType::kBUILD)
{
return static_cast<IPluginV3OneBuild*>(this);
}
if (type == PluginCapabilityType::kRUNTIME)
{
return static_cast<IPluginV3OneRuntime*>(this);
}
ASSERT(type == PluginCapabilityType::kCORE);
return static_cast<IPluginV3OneCore*>(this);
}
catch(...)
{
// log error
}
return nullptr;
}
The methods that are of importance in this particular example are:
INetworkDefinition::addPluginV3
IPluginV3OneBuild::getNbOutputs
IPluginV3OneBuild::getOutputDataTypes
IPluginV3OneBuild::getOutputShapes
IPluginV3OneBuild::supportsFormatCombination
IPluginV3OneBuild::configurePlugin
IPluginV3OneRuntime::onShapeChange
IPluginV3OneRuntime::enqueue
INetworkDefinition::addPluginV3
(C++, Python) can add the plugin to the network.
std::vector<ITensor*> inputs{X};
auto pluginLayer = network->addPluginV3(inputs.data(), inputs.size(), nullptr, 0, *plugin);
You can communicate that there is a single plugin output by overriding IPluginV3OneBuild::getNbOutputs
.
int32_t PadPlugin::getNbOutputs() const noexcept override
{
return 1;
}
The output will have the same data type as the input, which can be communicated in the override of IPluginV3OneBuild::getOutputDataTypes
.
int32_t PadPlugin::getOutputDataTypes(
DataType* outputTypes, int32_t nbOutputs, DataType const* inputTypes, int32_t nbInputs) const noexcept override
{
outputTypes[0] = inputTypes[0];
return 0;
}
The override for getOutputShapes
returns symbolic expressions for the output dimensions in terms of the input dimensions, except in the case of data-dependent output shapes, which will be covered later in Example: Adding a Custom Layer with a Data-Dependent and Shape Input-Dependent Shapes Using C++. In the current example, the first two dimensions of the output will equal the first two dimensions of the input, respectively, and the last two dimensions will be constants, each equal to 32. The IExprBuilder
passed into getOutputShapes
can be used to define constant symbolic expressions.
int32_t PadPlugin::getOutputShapes(DimsExprs const* inputs, int32_t nbInputs, DimsExprs const* shapeInputs, int32_t nbShapeInputs, DimsExprs* outputs, int32_t nbOutputs, IExprBuilder& exprBuilder) noexcept
{
outputs[0].nbDims = 4;
// first two output dims are equal to the first two input dims
outputs[0].d[0] = inputs[0].d[0];
outputs[0].d[1] = inputs[0].d[1];
// The last two output dims are equal to 32
outputs[0].d[2] = exprBuilder.constant(32);
outputs[0].d[3] = exprBuilder.constant(32);
return 0;
}
TensorRT uses supportsFormatCombination
to ask whether the plugin accepts a given type and format combination for a connection at a given position pos
and given formats/types for lesser-indexed connections. The interface indexes the inputs/outputs uniformly as connections, starting at 0
for the first input, then the rest of the inputs in order, followed by numbering the outputs. In the example, the input is connection 0
, and the output is connection 1
.
For the sake of simplicity, the example supports only linear formats and FP32 types.
bool PadPlugin::supportsFormatCombination(
int32_t pos, DynamicPluginTensorDesc const* inOut, int32_t nbInputs, int32_t nbOutputs) noexcept override
{
assert(0 <= pos && pos < 2);
return inOut[pos].desc.format == PluginFormat::kLINEAR && inOut[pos].desc.type == DataType::kFLOAT;
}
TensorRT invokes two methods to allow the plugin to make any configuration choices before enqueue()
, both during auto-tuning (in the engine build phase) and when the engine is being executed (in the runtime phase).
IPluginV3OneBuild::configurePlugin
: Called when a plugin is being prepared for profiling (auto-tuning) but not for any specific input size. Themin
,max
, andopt
values of theDynamicPluginTensorDesc
correspond to the bounds on the tensor shape and its shape for auto-tuning. Thedesc.dims
field corresponds to the dimensions of the plugin specified at network creation, including any wildcards (-1
) for dynamic dimensions.IPluginV3OneRuntime::onShapeChange
: Called during both the build-phase and runtime phase beforeenqueue()
to communicate the input and output shapes for the subsequentenqueue()
. The outputPluginTensorDesc
will contain wildcards (-1
) for any data-dependent dimensions specified throughgetOutputShapes()
.
This plugin does not need configurePlugin
and onShapeChange
to do anything, so they are no-ops:
int32_t PadPlugin::configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs, DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept override
{
return 0;
}
int32_t PadPlugin::onShapeChange(PluginTensorDesc const* in, int32_t nbInputs, PluginTensorDesc const* out, int32_t nbOutputs) noexcept override
{
return 0;
}
Finally, the override PadPlugin::enqueue
has to do the work. Since shapes are dynamic, enqueue is handed a PluginTensorDesc
that describes each input and output’s dimensions, type, and format.
int32_t enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc, void const* const* inputs,
void* const* outputs, void* workspace, cudaStream_t stream) noexcept override
{
// populate outputs and return status code
}
Example: Adding a Custom Layer with Data-Dependent and Shape Input-Dependent Shapes Using C++#
This section shows an example of a plugin with data-dependent and shape-input-dependent shapes. Note that data-dependent output shapes and adding shape inputs to a plugin are new features not present in V2 plugins.
Data-dependent Shapes (DDS): The shape of a plugin output could depend on the values of the input tensors.
Shape inputs: A plugin could accept shape and device tensor inputs. These inputs are only visible to the plugin as arguments to
IPluginV3OneBuild::getOutputShapes()
. Therefore, their sole purpose is to aid the plugin in performing output shape calculations.
For example, BarPlugin
is a plugin with one device input X
, one shape input S
, and an output Y
, where:
The first dimension of
Y
depends on the value ofS
.The second dimension of
Y
is static.The third dimension of
Y
depends on the shape ofX
.The fourth dimension of
Y
is data-dependent.
Similar to PadPlugin
in the prior example, BarPlugin
uses multiple inheritance.
To add the plugin to the network, INetworkDefinition::addPluginV3
(C++, Python) can be used similarly. After the device tensor inputs, addPluginV3
takes two additional arguments to specify the shape tensor inputs.
std::vector<ITensor*> inputs{X};
std::vector<ITensor*> shapeInputs{S};
auto pluginLayer = network->addPluginV3(inputs.data(), inputs.size(), shapeInputs.data(), shapeInputs.size(), *plugin);
Note
The TensorRT ONNX parser provides an inbuilt feature to pass shape inputs to custom ops supported by IPluginV3
-based plugins. The indices of the inputs to be interpreted as shape inputs must be indicated by a node attribute named tensorrt_plugin_shape_input_indices
as a list of integers. For example, if the custom op has four inputs and the second and fourth inputs should be passed as shape inputs to the plugin, add a node attribute named tensorrt_plugin_shape_input_indices
of type onnx.AttributeProto.ints
containing the value [1, 3]
.
In the override for getOutputShapes
, plugins must declare both the position and the bounds of each data-dependent dimension of each output tensor. The bounds can be expressed using a special output called a size tensor.
A size tensor is a scalar of either INT32 or INT64 data type, expressed through a value for auto-tuning and an upper bound; these values can either be constants or computed in terms of device input shapes or shape input values using IExprBuilder
.
In this case, there is a singular data-dependent dimension, which we can represent using one size tensor. Note that any size tensor needed to express a data-dependent dimension counts as an output of the plugin; therefore, the plugin will have two outputs in total.
int32_t getNbOutputs() const noexcept override
{
return 2;
}
Assume output Y
is the same type as the device input X
and that the data-dependent dimension size fits INT32 (the size tensor has type r
). Then BarPlugin expresses the output data types like this:
int32_t getOutputDataTypes(
DataType* outputTypes, int32_t nbOutputs, DataType const* inputTypes, int32_t nbInputs) const noexcept override
{
outputTypes[0] = inputTypes[0];
outputTypes[1] = DataType::kINT32;
return 0;
}
The method getOutputShapes
can build symbolic output shape expressions using the IExprBuilder
passed to it. In what follows, note that size tensors must be explicitly declared 0D.
int32_t BarPlugin::getOutputShapes(DimsExprs const* inputs, int32_t nbInputs, DimsExprs const* shapeInputs, int32_t nbShapeInputs, DimsExprs* outputs, int32_t nbOutputs, IExprBuilder& exprBuilder) noexcept
{
outputs[0].nbDims = 4;
// The first output dimension depends on the value of S.
// The value of S is encoded as fictitious dimensions.
outputs[0].d[0] = shapeInputs[0].d[0];
// The third output dimension depends on the shape of X
outputs[0].d[2] = inputs[0].d[0];
// The second output dimension is static
outputs[0].d[1] = exprBuilder.constant(3);
auto upperBound = exprBuilder.operation(DimensionOperation::kPROD, *inputs[0].d[2], *inputs[0].d[3]);
auto optValue = exprBuilder.operation(DimensionOperation::kFLOOR_DIV, *upperBound, *exprBuilder.constant(2));
// output at index 1 is a size tensor
outputs[1].nbDims = 0; // size tensors must be declared as 0-D
auto sizeTensor = exprBuilder.declareSizeTensor(1, *optValue, *upperBound);
// The fourth output dimension is data-dependent
outputs[0].d[3] = sizeTensor;
return 0;
}
The override of supportsFormatCombination
imposes the following conditions:
The device input
X
must haveDataType::kFLOAT
orDataType::kHALF
.The output
Y
must have the same type asX
.The size tensor output has the type
DataType::kINT32
.
Note
Shape inputs passed to the plugin through addPluginV3
(C++, Python) only appear as arguments to getOutputShapes()
and are not counted or included among plugin inputs in any other plugin interface method.
bool BarPlugin::supportsFormatCombination(
int32_t pos, DynamicPluginTensorDesc const* inOut, int32_t nbInputs, int32_t nbOutputs) noexcept override
{
assert(0 <= pos && pos < 3);
auto const* in = inOut;
auto const* out = inOut + nbInputs;
bool typeOk{false};
switch (pos)
{
case 0: typeOk = in[0].desc.type == DataType::kFLOAT || in[0].desc.type == DataType::kHALF; break;
case 1: typeOk = out[0].desc.type == in[0].desc.type; break;
case 2: typeOk = out[1].desc.type == DataType::kINT32; break;
}
return inOut[pos].desc.format == PluginFormat::kLINEAR && typeOk;
}
The local variables in
and out
here allow inspecting inOut
by input or output number instead of connection number.
Important
The override inspects the format/type for a connection with an index less than pos but must never inspect the format/type for a connection with an index greater than pos
. The example uses case 1
to check connection 1
against connection 0
and not case 0
to check connection 0
against connection 1
.
configurePlugin
and onShapeChange
would be no-ops here, too; one thing to note is that in onShapeChange
, the output’s PluginTensorDesc
will contain a wildcard (-1
) for the data-dependent dimension.
Implementing enqueue
with data-dependent output shapes differs greatly from the static or dynamic shape cases. As with any other output, for an output with a data-dependent dimension, the output buffer passed to enqueue
is guaranteed large enough to hold the corresponding output tensor (based on the upper bound specified through getOutputShapes
).
Example: Adding a Custom Layer with INT8 I/O Support Using C++#
PoolPlugin
is a plugin demonstrating how to addINT8
I/O for a custom pooling layer using IPluginV3
. PoolPlugin
multiply inherits from IPluginV3
, IPluginV3OneCore
, IPluginV3OneBuild
, and IPluginV3OneRuntime
, similar to the PadPlugin
and BarPlugin
examples above.
The main methods that affect INT8 I/O are:
supportsFormatCombination
configurePlugin
The override for supportsFormatCombination
must indicate which INT8 I/O combination is allowed. This interface is similar to Example: Adding a Custom Layer with Dynamic Shapes using C++. In this example, the supported I/O tensor format is linear CHW with FP32, FP16, BF16, FP8, or INT8 data type, but the I/O tensor must have the same data type.
bool PoolPlugin::supportsFormatCombination(
int32_t pos, DynamicPluginTensorDesc const* inOut, int32_t nbInputs, int32_t nbOutputs) noexcept override
{
assert(nbInputs == 1 && nbOutputs == 1 && pos < nbInputs + nbOutputs);
bool condition = inOut[pos].desc.format == PluginFormat::kLINEAR;
condition &= (inOut[pos].desc.type == DataType::kFLOAT ||
inOut[pos].desc.type == DataType::kHALF ||
inOut[pos].desc.type == DataType::kBF16 ||
inOut[pos].desc.type == DataType::kFP8 ||
inOut[pos].desc.type == DataType::kINT8);
condition &= inOut[pos].desc.type == inOut[0].desc.type;
return condition;
}
Important
If INT8 calibration must be used with a network with INT8 I/O plugins, the plugin must support FP32 I/O, as TensorRT uses FP32 to calibrate the graph.
If the FP32 I/O variant is not supported or INT8 calibration is not used, all required INT8 I/O tensor scales must be set explicitly.
Calibration cannot determine the dynamic range of a plugin’s internal tensors. Plugins that operate on quantized data must calculate their dynamic range for internal tensors.
A plugin can be designed to accept FP8 and INT8 I/O types, although note that in TensorRT 9.0, the builder does not allow networks that mix INT8 and FP8.
Information communicated by TensorRT through configurePlugin
or onShapeChange
can be used to obtain information about the pooling parameters and the input and output scales. These can be stored as member variables, serialized, and then deserialized to be used during inference.
int32_t PoolPlugin::configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs, DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept override
{
...
mPoolingParams.mC = in.desc.d[1];
mPoolingParams.mH = in.desc.d[2];
mPoolingParams.mW = in.desc.d[3];
mPoolingParams.mP = out.desc.d[2];
mPoolingParams.mQ = ou.desc.d[3];
mInHostScale = in[0].desc.scale >= 0.0F ? in[0].desc.scale : -1.0F;
mOutHostScale = out[0].desc.scale >= 0.0F ? out[0].desc.scale : -1.0F;
}
INT8 I/O scales per tensor have been obtained from PluginTensorDesc::scale
.
Adding Custom Layers using the Python API (TensorRT >= 10.6)#
For most use cases, defining Python plugins with a decorator-based approach is recommended (available starting in TensorRT 10.6). Refer to the Writing Custom Operators with TensorRT Python Plugins in the TensorRT Python API documentation for a manual describing different use cases and best practices. Note that embedding Python-defined plugins to TensorRT engines such that the engine is independent of Python and the plugin source itself, is only possible with this approach.
Adding Custom Layers using the Python API (Advanced/TensorRT <= 10.5)#
Python with a class-based approach (it is also the only supported approach for TensorRT <= 10.5). In contrast to decorator-based Python plugins (described in the preceding section), class-based plugins offer the following:
Statefulness: class-based plugins have stated (for example, configured/non-configured) and more granular querying by TensorRT for different plugin properties and behaviors.
Shape tensor input support.
Fine-grained control of the plugin instances TensorRT creates during engine deserialization is only possible with custom plugin creator definitions, which are only available with a class-based approach.
Manual serialization and deserialization of plugin attributes.
Ability to pre-request a device memory scratch space (workspace in addition to input/output buffers) to avoid execution-time device memory allocations.
These often come at the expense of increased implementation complexity and code bloat, which can lead to more bugs. Therefore, a tradeoff analysis is recommended before considering class-based plugin implementations in Python.
Implementing a class-based plugin in Python is similar to C++ in that implementation of IPluginV3
and IPluginCreatorV3One
is necessary. Furthermore, interface methods in Python have mostly similar APIs to their C++ counterparts; most differences are minor and self-explanatory.
The following list includes a few selected changes. Subsequent subsections describe the differences involved in more detail.
The following plugin APIs have been omitted in favor of reading/writing to an appropriately named attribute.
Class |
Method |
Replaced with Attribute |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Some methods have default implementations; these can be left unimplemented, and the default behaviors outlined below will take effect:
class trt.IPluginV3: def destroy(self): pass class trt.IPluginV3OneBuild: def get_valid_tactics(self): return [] def get_workspace_size(self, input_desc, output_desc): return 0
Methods that must return integer status codes in
IPluginV3OneBuild
andIPluginV3OneRuntime
should raise exceptions in Python instead. For example:1int32_t configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs, DynamicPluginTensorDesc const* out, int32_t nbOutputs)
1configure_plugin(self: trt.IPluginV3OneBuild, in: List[trt.DynamicPluginTensorDesc], out: List[trt.DynamicPluginTensorDesc]) -> None
For example, you can raise a
ValueError
duringenqueue
if an input has an illegal value.The Python API
IPluginV3.destroy()
has no direct equivalent in the C++ API. Python plugins are expected to perform any functionality that would be performed in anIPluginV3
C++ destructor within theIPluginV3.destroy()
method.For full examples demonstrating Python plugins, refer to the python_plugin sample.
Registration of a Python Plugin#
Python plugins must be registered dynamically through the IPluginRegistry.register_creator()
API. There is no analog to the REGISTER_TENSORT_PLUGIN
available for static registration.
Building and Running TensorRT Engines Containing Python Plugins#
It is possible to build TensorRT engines using Python-based plugins. However, running such engines outside of Python is currently impossible since the plugin must be available in the scope where the engine is being deserialized. For example, you cannot use a tool like trtexec
directly.
Implementing enqueue
of a Python Plugin#
The API for IPluginV3OneRuntime::enqueue()
in C++ and Python are as follows:
1int32_t enqueue(PluginTensorDesc const *inputDesc, PluginTensorDesc const *outputDesc, void const *const *inputs, void *const *outputs, void *workspace, cudaStream_t stream)
1enqueue(self: trt.IPluginV3OneRuntime, input_desc: List[trt.PluginTensorDesc], output_desc: List[trt.PluginTensorDesc], inputs: List[int], outputs: List[int], workspace: int, stream: int) -> None
Here, inputs
, outputs
, and workspace
are passed in as intptr_t
casts of the respective device pointers. Similarly, a stream
is an intptr_t
cast of a pointer to the CUDA stream handle. There is flexibility within Python on how to read from and write to these buffers, and this can be achieved depending on the particular use case. For example, with CUDA Python, this is quite simple since cuda.cuLaunchKernel
accepts int
representing the pointers wrapped in NumPy arrays:
d_input = np.array([inputs[0]], dtype=np.uint64)
d_output = np.array([outputs[0]], dtype=np.uint64)
stream_ptr = np.array([stream], dtype=np.uint64)
args = [d_input, d_output]
kernel_args = np.array([arg.ctypes.data for arg in args], dtype=np.uint64)
…
checkCudaErrors(cuda.cuLaunchKernel(_float_kernel,
num_blocks, 1, 1,
block_size, 1, 1,
0,
stream_ptr,
kernel_args , 0))
Translating Device Buffers/CUDA Stream Pointers in enqueue
to other Frameworks#
Constructing CuPy arrays on top of device buffers is possible using CuPy’s UnownedMemory class.
def enqueue(self, input_desc, output_desc, inputs, outputs, workspace, stream):
...
inp_dtype = trt.nptype(input_desc[0].type)
inp_mem = cp.cuda.UnownedMemory(
inputs[0], volume(input_desc[0].dims) * cp.dtype(inp_dtype).itemsize, self
)
out_mem = cp.cuda.UnownedMemory(
outputs[0],
volume(output_desc[0].dims) * cp.dtype(inp_dtype).itemsize,
self,
)
inp_ptr = cp.cuda.MemoryPointer(inp_mem, 0)
out_ptr = cp.cuda.MemoryPointer(out_mem, 0)
inp = cp.ndarray((volume(input_desc[0].dims)), dtype=inp_dtype, memptr=inp_ptr)
out = cp.ndarray((volume(output_desc[0].dims)), dtype=inp_dtype, memptr=out_ptr)
If needed, torch.as_tensor() can then be used to construct a Torch array:
# inp_d = cp.ndarray(tuple(input_desc[0].dims), dtype=inp_dtype, memptr=inp_ptr)
inp_t = torch.as_tensor(inp_d, device='cuda')
Similarly, CuPy stream handles can be constructed from the passed-in stream pointer through CuPy’s ExternalStream class.
cuda_stream = cp.cuda.ExternalStream(stream)
Automatic Downcasting#
TensorRT Python bindings will do automatic downcasting for custom types written in Python implementing interfaces like IPluginCreatorV3One
or IPluginResource
. For example, take the following method from IPluginRegistry
as an example:
get_creator(self: trt.IPluginRegistry, name: string, version: string,
namespace: string = “”) -> trt.IPluginCreatorInterface
The return type is indicated as IPluginCreatorInterface
. However, in practice, if you were to write a class MyPluginCreator
implementing IPluginCreatorV3One
(which in turn implements IPluginCreatorInterface
), the get_creator
method will return an automatically downcasted type of MyPluginCreator
.
This extends to trt.IPluginRegistry.all_creators
, which is a List[trt.IPluginCreatorInterface]
. If you had registered a plugin creator of type MyPluginCreator
and another type MyOtherPluginCreator
, both plugin creators will be present as those respective types in the list.
Example: Adding a Custom Layer to a TensorRT Network Using Python#
Using plugin nodes, custom layers can be added to any TensorRT network in Python. The Python API has a function called add_plugin_v3
that enables adding a plugin node to a network. The following example illustrates this. It creates a simple TensorRT network and adds a hypothetical plugin node by looking up the TensorRT plugin registry.
import tensorrt as trt
import numpy as np
TRT_LOGGER = trt.Logger()
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
def get_trt_plugin(plugin_name, plugin_version, plugin_namespace):
plugin = None
plugin_creator = trt.get_plugin_registry().get_creator(plugin_name, plugin_version, plugin_namespace)
# trt will automatically downcast to IPluginCreator or IPluginCreatorInterface
# Can inspect plugin_creator.interface_info to make sure
if plugin_creator is not None:
lrelu_slope_field = trt.PluginField("epsilon", np.array([0.00000001], dtype=np.float32), trt.PluginFieldType.FLOAT32)
field_collection = trt.PluginFieldCollection([lrelu_slope_field])
plugin = plugin_creator.create_plugin(name=plugin_name, field_collection=field_collection, phase=trt.TensorRTPhase.BUILD)
return plugin
def main():
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()
config = builder.create_builder_config()
config.max_workspace_size = 2**20
input_layer = network.add_input(name="input_layer", dtype=trt.float32, shape=(1, 1))
plugin = network.add_plugin_v3(inputs=[input_layer], shape_inputs=[], plugin=get_trt_plugin("MY_PLUGIN", "1", ""))
plugin.get_output(0).name = "outputs"
network.mark_output(plugin.get_output(0))
Enabling Timing Caching and Using Custom Tactics#
IPluginV3
provides more control over the profiling of custom layers, which were unavailable with V2 plugins and earlier. One such feature is enabling timing caching. If a TensorRT network contains multiple instances of the same plugin, identically configured (for example, same plugin attribute values) and handling identical input-output shapes and types, then it would make sense to time (measure latency) of only one instance, cache the latency, and skip timing the rest of the instances. This could enable large savings in terms of engine build time.
Timing caching for IPluginV3
plugins is an opt-in feature; to opt-in, the plugin must advertise a non-null timing cache ID.
1char const* FooPlugin::getTimingCacheID() noexcept override
2{
3 // return nullptr to disable timing caching (default behavior)
4 // return non-null string to enable timing caching
5}
1def FooPlugin(trt.IPluginV3, trt.IPluginV3OneBuild, ...):
2 def __init__(self):
3 # set to None to disable timing caching
4 self.timing_cache_id = value
Note the following regarding the timing cache ID:
The user-provided timing cache ID should be considered a suffix to a larger cache ID; TensorRT automatically forms a prefix by considering the plugin’s input/output shape and format information. Usually, the user-provided timing cache ID could consist of plugin attributes and their values.
It must reflect the plugin’s creation state and not evolve after creation.
For V2 plugins, TensorRT only times the plugin for any (multiple) type/format combinations it claims to support. With IPluginV3
, plugins can also ensure custom tactics are timed, and TensorRT uses the fastest tactic. For example, the plugin may have one of two kernels to compute the output, and it may not be possible to predict which one would be fastest on a specific platform and for specific input/output shapes and formats. It is possible to ask TensorRT to time the plugin for each tactic for each format combination, figure out the fastest such configuration, and use that during inference.
Note
TensorRT may choose not to time the plugin if it only supports one type/format combination and either does not use custom tactics or only advertises one.
For
IPluginV3OneBuild
, TensorRT times a maximum ofgetFormatCombinationLimit()
type/format combinations for each tactic; override this method to increase/decrease this limit depending on need.
To get started, advertise the custom tactics to TensorRT:
1int32_t FooPlugin::getNbTactics() noexcept override
2{
3 return 2; // return 0 to disable custom tactics (default behavior)
4}
5
6int32_t FooPlugin::getValidTactics(int32_t* tactics, int32_t nbTactics) noexcept override
7{
8 tactics[0] = 1;
9 tactics[1] = 2;
10 return 0;
11}
1def get_valid_tactics(self):
2 return [1, 2] # return empty vector to disable custom tactics (default behavior)
Any strictly positive integer could be used as a custom tactic value (TensorRT reserves 0
as the default tactic).
When the plugin is timed, configurePlugin()
is guaranteed to be called with the current input/output format combination before getValidTactics()
is called. Therefore, it is possible to advertise a different set of tactics per input/output format combination. For example, for a plugin that supports FP32 and FP16, tactic 1
may be restricted to only FP16 while supporting tactics 1
and 2
for FP32.
During the engine build, when auto-tuning the plugin, TensorRT will communicate the tactic for the subsequent enqueue()
by invoking IPluginV3OneRuntime::setTactic
(C++, Python). When an engine is deserialized, TensorRT will invoke setTactic
once the plugin has been created to communicate the best tactic chosen for the plugin. Even if custom tactics are not used, setTactic
will be called with the default tactic value 0
.
Using Custom Layers When Importing a Model with a Parser#
The ONNX parser automatically attempts to import unrecognized nodes as plugins. If a plugin with the same op_type
as the node is found in the plugin registry, the parser forwards the node’s attributes to the plugin creator as plugin field parameters to create the plugin. By default, the parser uses "1"
as the plugin version and """
as the plugin namespace. This behavior can be overridden by setting a plugin_version and plugin_namespace string attribute in the corresponding ONNX node.
Sometimes, you can modify an ONNX graph before importing it into TensorRT. For example, to replace a set of ops with a plugin node. To accomplish this, you can use the ONNX GraphSurgeon utility. For details on how to use ONNX-GraphSurgeon to replace a subgraph, refer to this example.
For more examples, refer to the onnx_packnet sample.
Plugin API Description#
All new plugins should derive from both IPluginCreatorV3One
and IPluginV3
classes. In addition, new plugins should also be registered in the plugin registry, either dynamically by using IPluginRegistry::registerCreator()
or statically using the REGISTER_TENSORRT_PLUGIN(...)
macro. Custom plugin libraries can also consider implementing an init
function equivalent to initLibNvInferPlugins()
to perform bulk registration.
Note
Automotive safety users must use the REGISTER_SAFE_TENSORRT_PLUGIN(...)
macro instead of REGISTER_TENSORRT_PLUGIN(...)
.
IPluginV3
API Description#
The following section describes the functions of IPluginV3
and, by extension, IPluginV3OneCore
, IPluginV3OneBuild
or IPluginV3OneBuildV2
, and IPluginV3OneRuntime
.
Since an IPluginV3
object consists of different capabilities, IPluginV3::getCapabilityInterface
may be called anytime during its lifetime. An IPluginV3
object added for the build phase must return a valid capability interface for all capability types: core, build, and runtime. The build capability may be omitted for objects added for the runtime phase.
There are a few methods used to request identifying information about the plugin. They may also be called during any stage of the plugin’s lifetime.
IPluginV3OneCore::getPluginName
: Used to query for the plugin’s nameIPluginV3OneCore::getPluginVersion
: Used to query for the plugin’s versionIPluginV3OneCore::getPluginNamespace
: Used to query for the plugin’s namespaceIPluginV3OneBuild::getMetadataString
: Used to query for a string representation of any metadata associated with the plugin, such as the values of its attributes.
To connect a plugin layer to neighboring layers and set up input and output data structures, the builder checks for the number of outputs and their shapes by calling the following plugin methods:
IPluginV3OneBuild::getNbOutputs
: Used to specify the number of output tensors.IPluginV3OneBuild::getOutputShapes
: This function specifies the output shapes as a function of the input shapes or constants. The exception is data-dependent shapes with a specified upper bound and optimal tuning value.IPluginV3OneBuild::supportsFormatCombination
: Used to check if a plugin supports a given data type and format combination.IPluginV3OneBuild::getOutputDataType
: This function retrieves the data types of the output tensors. The returned data types must be in a format supported by the plugin.
If the IPluginV3OneBuildV2
build capability is used, the plugin can also communicate to TensorRT that certain input-output pairs are aliased (share the same data buffer). TensorRT will query IPluginV3OneBuildV2::getAliasedInput
to determine any such aliasing behavior. To use this feature, PreviewFeature::kALIASED_PLUGIN_IO_10_03
must be enabled.
Plugin layers can support the following data formats:
LINEAR
single-precision (FP32), half-precision (FP16), brain floating-point (BF16), 8-bit floating-point E4M3 (FP8), integer (INT8), and integer (INT32) tensorsCHW32 single-precision (FP32) and integer (INT8) tensors
CHW2, HWC8, HWC16, and DHWC8 half-precision (FP16) tensors
CHW4 half-precision (FP16), and integer (INT8) tensors
HWC8, HWC4, NDHWC8, NC2HW brain floating-point (BF16) tensors
PluginFormat
counts the formats.
Plugins that do not compute all data in place and need memory space in addition to input and output tensors can specify the additional memory requirements with the IPluginV3OneBuild::getWorkspaceSize
method, which the builder calls to determine and preallocate scratch space.
The layer is configured, executed, and destroyed at build time to discover optimal configurations. After selecting the optimal configuration for a plugin, the chosen tactic and concrete shape/format information (except for data-dependent dimensions) are communicated to the plugin during inference. It is executed as many times as needed for the lifetime of the inference application and finally destroyed when the engine is destroyed.
The builder controls these steps and runtime using the following plugin methods. Methods also called during inference are indicated by (*
) - all others are only called by the builder.
IPluginV3OneBuild::attachToContext*
: This function requests that a plugin clone be attached to anExecutionContext
, allowing the plugin to access any context-specific resources.IPluginV3OneBuild::getTimingCacheId
: This function queries for any timing cached ID that TensorRT may use. If provided, it enables timing caching (it is disabled by default).IPluginV3OneBuild::getNbTactics
: Used to query for the number of custom tactics the plugin chooses to use.IPluginV3OneBuild::getValidTactics
: This function queries for any custom tactics the plugin may use. The plugin will be profiled for each tactic up to a maximum indicated byIPluginV3OneBuild::getFormatCombinationLimit()
.IPluginV3OneBuild::getFormatCombinationLimit
: This function queries the maximum number of format combinations that may be timed for each tactic (0
if no custom tactics are advertised for the default tactic).IPluginV3OneRuntime::setTactic*
: Communicates the tactic to be used during the subsequentenqueue()
. If no custom tactics were advertised, this would always be0
.IPluginV3OneBuild::configurePlugin
: Communicates the number of inputs and outputs and their shapes, data types, and formats. Themin
,opt
, andmax
of each input or output’sDynamicPluginTensorDesc
correspond to thekMIN
,kOPT
, andkMAX
values of the optimization profile that the plugin is currently profiled for. Thedesc.dims
field corresponds to the dimensions of plugin inputs specified at network creation. Wildcard dimensions may exist during this phase in thedesc.dims
field. At this point, the plugin may set up its internal state and select the most appropriate algorithm and data structures for the given configuration.IPluginV3OneRuntime::onShapeChange*
: Communicates the number of inputs and outputs and their shapes, data types, and formats. The dimensions are concrete, except if data-dependent dimensions exist, which wildcards will indicate.IPluginV3OneRuntime::enqueue*
: Encapsulates the actual algorithm and kernel calls of the plugin and provides pointers to input, output, and scratch space, as well as the CUDA stream to be used for kernel execution.IPluginV3::clone
: This is called every time a new builder, network, or engine is created that includes this plugin layer. It must return a new plugin object with the correct parameters.
After the builder completes profiling, before the engine is serialized, IPluginV3OneRuntime::getFieldsToSerialize
is called to query for any plugin fields that must be serialized into the engine. These are expected to be data that the plugin needs to function properly during the inference stage once the engine has been deserialized.
IPluginCreatorV3One
API Description#
The following methods in the IPluginCreatorV3One
class are used to find and create the appropriate plugin from the plugin registry:
getPluginName
: This returns the plugin name and should match the return value ofIPluginV3OneCore::getPluginName
.getPluginVersion
: Returns the plugin version. For all internal TensorRT plugins, this defaults to1
.getPluginNamespace
: Returns the plugin namespace. The default can be""
.getFieldNames
: To successfully create a plugin, you must know all the plugin’s field parameters. This method returns thePluginFieldCollection
struct with thePluginField
entries populated to reflect the field name andPluginFieldType
(the data should point tonullptr
).createPlugin
: This method creates a plugin, passing aPluginFieldCollection
and aTensorRTPhase
argument.
During engine deserialization, TensorRT calls this method with the TensorRTPhase
argument set to TensorRTPhase::kRUNTIME
and the PluginFieldCollection
populated with the same PluginFields
as in the one returned by IPluginV3OneRuntime::getFieldsToSerialize()
. In this case, TensorRT takes ownership of plugin objects returned by createPlugin
.
You may also invoke createPlugin
to produce plugin objects to add to a TensorRT network. In this case, setting the phase argument to TensorRTPhase::kBUILD
is recommended. The data passed with the PluginFieldCollection
should be allocated and freed by the caller before the program is destroyed. The ownership of the plugin object returned by the createPlugin
function is passed to the caller and must be destroyed.
Migrating V2 Plugins to IPluginV3
#
IPluginV2
and IPluginV2Ext
have been deprecated since TensorRT 8.5, and IPluginV2IOExt
and IPluginV2DynamicExt
are deprecated in TensorRT 10.0. Therefore, new plugins should target IPluginV3
, and old ones should be refactored.
Keep in mind the following key points when migrating an IPluginV2DynamicExt
plugin to IPluginV3
:
The plugin creator associated with the plugin must be migrated to
IPluginCreatorV3One
, the factory class forIPluginV3
(IPluginCreator
is the factory class forIPluginV2
derivatives). This simply consists of migratingIPluginCreator::deserializePlugin
. For more information, refer to the Plugin Serialization and Deserialization section.There is no equivalent to
IPluginV2::initialize()
,IPluginV2::terminate()
, andIPluginV2::destroy()
inIPluginV3
. For more information, refer to the Plugin Initialization and Termination section.There is no equivalent to
IPluginV2Ext::detachFromContext()
inIPluginV3
. For more information, refer to the Accessing Context-Specific Resources Provided by TensorRT section.IPluginV3OneRuntime::attachToContext()
is markedly different fromIPluginV2Ext::attachToContext()
regarding arguments and behavior. For more information, refer to the Accessing Context-Specific Resources Provided by TensorRT section.In
IPluginV3
, plugin serialization is through aPluginFieldCollection
that gets passed to TensorRT byIPluginV3OneRuntime::getFieldsToSerialize()
and deserialization is through the samePluginFieldCollection
that gets passed back by TensorRT toIPluginCreatorV3One::createPlugin(...)
. For more information, refer to the Plugin Serialization and Deserialization section.The
IPluginV3
equivalents of void return methods inIPluginV2DynamicExt
will expect an integer status code as a return value (for example,configurePlugin
).supportsFormatCombination
andgetWorkspaceSize
get dynamic tensor descriptors (DynamicPluginTensorDesc
) instead of static descriptors (PluginTensorDesc
).IPluginV2DynamicExt::getOutputDimensions()
becomesIPluginV3OneBuild::getOutputShapes()
and changes to an output parameter signature instead of a return value. It also shifts from per-output index querying to one-shot querying. A similar transition applies fromIPluginV2Ext::getOutputDataType
toIPluginV3OneBuild::getOutputDataTypes
.
Plugin Initialization and Termination#
IPluginV2
provided several APIs for plugin initialization and termination: namely, IPluginV2::initialize()
, IPluginV2::terminate()
, and IPluginV2::destroy()
. In IPluginV3
, plugins are expected to be constructed in an initialized state; if your V2 plugin had any lazy initialization in initialize
, it can be deferred to onShapeChange
or configurePlugin
. Any resource release or termination logic in IPluginV2::terminate()
or IPluginV2::destroy()
can be moved to the class destructor. The exception is in the Python API; IPluginV3.destroy()
is provided as an alternative for a C++-like destructor.
Accessing Context-Specific Resources Provided by TensorRT#
IPluginV2Ext::attachToContext()
provided plugins access to context-specific resources, namely the GPU allocator and cuDNN and cuBLAS handles. IPluginV3OneRuntime::attachToContext()
is meant to provide a similar service to plugins, but it instead provides an IPluginResourceContext
, which in turn exposes resources that plugins may request.
In a departure from IPluginV2Ext::attachToContext()
, cuDNN and cuBLAS handles are no longer provided by IPluginResourceContext
; any plugins that depended on those should migrate to initialize their own cuDNN and cuBLAS resources. If sharing cuDNN/cuBLAS resources among plugins is preferred, you can utilize the functionality provided by IPluginResource
and the plugin registry’s key-value store to accomplish this. For more information, refer to the Sharing Custom Resources Among Plugins section.
IPluginV3OneRuntime::attachToContext(...)
is a clone-and-attach operation. It is asked to clone the entire IPluginV3
object—not just the runtime capability. Therefore, if implemented as a separate class, the runtime capability object may need to hold a reference to the IPluginV3
object of which it is a part.
Any context-specific resource obtained through IPluginResourceContext
may be used until the plugin is destroyed. Therefore, any termination logic implemented in IPluginV2Ext::detachFromContext()
may be moved to the plugin destructor.
Plugin Serialization and Deserialization#
For V2 plugins, serialization and deserialization were determined by the implementation of IPluginV2::serialize
, IPluginV2::getSerializationSize
, and IPluginCreator::deserializePlugin
; IPluginV3OneRuntime::getFieldsToSerialize
and IPluginCreatorV3One::createPlugin
have replaced these. Note that the workflow has shifted from writing to/reading from a raw buffer to constructing and parsing a PluginFieldCollection
.
TensorRT handles the serialization of types defined in PluginFieldType
. Custom types can be serialized as PluginFieldType::kUNKNOWN
. For example:
struct DummyStruct
{
int32_t a;
float b;
};
DummyPlugin()
{
// std::vector<nvinfer1::PluginField> mDataToSerialize;
// int32_t mIntValue;
// std::vector<float> mFloatVector;
// DummyStruct mDummyStruct;
mDataToSerialize.clear();
mDataToSerialize.emplace_back(PluginField("intScalar", &mIntValue, PluginFieldType::kINT32, 1));
mDataToSerialize.emplace_back(PluginField("floatVector", mFloatVector.data(), PluginFieldType::kFLOAT32, mFloatVector.size()));
mDataToSerialize.emplace_back(PluginField("dummyStruct", &mDummyStruct, PluginFieldType::kUNKNOWN, sizeof(DummyStruct)));
mFCToSerialize.nbFields = mDataToSerialize.size();
mFCToSerialize.fields = mDataToSerialize.data();
}
nvinfer1::PluginFieldCollection const* DummyPlugin::getFieldsToSerialize() noexcept override
{
return &mFCToSerialize;
}
Migrating Older V2 Plugins to IPluginV3
#
If migrating from IPluginV2
or IPluginV2Ext
to IPluginV3
, it is easier to migrate first to IPluginV2DynamicExt
and then follow the guidelines above to migrate to IPluginV3
. The new features in IPluginV2DynamicExt
are as follows:
virtual DimsExprs getOutputDimensions(int outputIndex, const DimsExprs* inputs, int nbInputs, IExprBuilder& exprBuilder) = 0;
virtual bool supportsFormatCombination(int pos, const PluginTensorDesc* inOut, int nbInputs, int nbOutputs) = 0;
virtual void configurePlugin(const DynamicPluginTensorDesc* in, int nbInputs, const DynamicPluginTensorDesc* out, int nbOutputs) = 0;
virtual size_t getWorkspaceSize(const PluginTensorDesc* inputs, int nbInputs, const PluginTensorDesc* outputs, int nbOutputs) const = 0;
virtual int enqueue(const PluginTensorDesc* inputDesc, const PluginTensorDesc* outputDesc, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) = 0;
Guidelines for migration to IPluginV2DynamicExt
are:
getOutputDimensions
implements the expression for output tensor dimensions given the inputs.supportsFormatCombination
checks if the plugin supports the format and datatype for the specified I/O.configurePlugin
mimics the behavior of equivalentconfigurePlugin
inIPluginV2Ext
but accepts tensor descriptors.getWorkspaceSize
and enqueue mimic the behavior of equivalent APIs inIPluginV2Ext
but accept tensor descriptors.
Coding Guidelines for Plugins#
Memory Allocation
Memory allocated in the plugin must be freed to ensure no memory leak. If resources are acquired in the plugin constructor or at a later stage, like onShapeChange
, they must be released, possibly in the plugin class destructor.
Another option is to request any additional workspace memory required through getWorkspaceSize
, which will be available during enqueue
.
Add Checks to Ensure Proper Configuration and Validate Inputs
A common source for unexpected plugin behavior is improper configuration (for example, invalid plugin attributes) and invalid inputs. As such, it is good practice to add checks/assertions during the initial plugin development for cases where the plugin is not expected to work. The following are places where checks could be added:
createPlugin
: Plugin attributes checksconfigurePlugin
oronShapeChange
: Input dimension checksenqueue
: Input value checks
Return Null at Errors for Methods That Create a New Plugin Object
Methods like createPlugin
, clone
, and attachToContext
may be expected to create and return new plugin objects. In these methods, ensure a null object (nullptr
in C++) is returned in case of any error or failed check. This ensures that non-null plugin objects are not returned when configured incorrectly.
Avoid Device Memory Allocations in clone()
Since the builder calls clone multiple times, device memory allocations could be significantly expensive. One option is to do persistent memory allocations in the constructor, copy to a device when the plugin is ready (for example, in configurePlugin
), and release during destruction.
Serializing Arbitrary Pieces of Data and Custom Types
Plugin authors can utilize PluginField
of PluginFieldType::kUNKNOWN
to indicate arbitrary pieces of data to be serialized. In this case, the length
of the respective PluginField
should be the number of bytes corresponding to the buffer pointed to by data
. The serialization of non-primitive types can be achieved in this way.