Plugin API Description#
All new plugins should derive from both IPluginCreatorV3One and IPluginV3 classes. In addition, new plugins should also be registered in the plugin registry, either dynamically by using IPluginRegistry::registerCreator() or statically using the REGISTER_TENSORRT_PLUGIN(...) macro. Custom plugin libraries can also consider implementing an init function equivalent to initLibNvInferPlugins() to perform bulk registration.
Note
Automotive safety users must use the REGISTER_SAFE_TENSORRT_PLUGIN(...) macro instead of REGISTER_TENSORRT_PLUGIN(...). Refer to the NVIDIA TensorRT Safety Production Guide for DriveOS for any safety-related activities.
IPluginV3 API Description#
The following section describes the functions of IPluginV3 and, by extension, IPluginV3OneCore, IPluginV3OneBuild or IPluginV3OneBuildV2, and IPluginV3OneRuntime.
Since an IPluginV3 object consists of different capabilities, IPluginV3::getCapabilityInterface can be called anytime during its lifetime. An IPluginV3 object added for the build phase must return a valid capability interface for all capability types: core, build, and runtime. The build capability can be omitted for objects added for the runtime phase.
There are a few methods used to request identifying information about the plugin. They can also be called during any stage of the plugin’s lifetime.
IPluginV3OneCore::getPluginName: Used to query for the plugin’s nameIPluginV3OneCore::getPluginVersion: Used to query for the plugin’s versionIPluginV3OneCore::getPluginNamespace: Used to query for the plugin’s namespaceIPluginV3OneBuild::getMetadataString: Used to query for a string representation of any metadata associated with the plugin, such as the values of its attributes.
To connect a plugin layer to neighboring layers and set up input and output data structures, the builder checks for the number of outputs and their shapes by calling the following plugin methods:
IPluginV3OneBuild::getNbOutputs: Used to specify the number of output tensors.IPluginV3OneBuild::getOutputShapes: This function specifies the output shapes as a function of the input shapes or constants. The exception is data-dependent shapes with a specified upper bound and optimal tuning value.IPluginV3OneBuild::supportsFormatCombination: Used to check if a plugin supports a given data type and format combination.IPluginV3OneBuild::getOutputDataType: This function retrieves the data types of the output tensors. The returned data types must be in a format supported by the plugin.
If the IPluginV3OneBuildV2 build capability is used, the plugin can also communicate to TensorRT that certain input-output pairs are aliased (share the same data buffer). TensorRT will query IPluginV3OneBuildV2::getAliasedInput to determine any such aliasing behavior. To use this feature, PreviewFeature::kALIASED_PLUGIN_IO_10_03 must be enabled.
Plugin layers can support the following data formats:
LINEARsingle-precision (FP32), half-precision (FP16), brain floating-point (BF16), 8-bit floating-point E4M3 (FP8), integer (INT8), and integer (INT32) tensorsCHW32 single-precision (FP32) and integer (INT8) tensors.
CHW2, HWC8, HWC16, and DHWC8 half-precision (FP16) tensors.
CHW4 half-precision (FP16), and integer (INT8) tensors.
HWC8, HWC4, NDHWC8, NC2HW brain floating-point (BF16) tensors.
PluginFormat counts the formats.
Plugins that do not compute all data in place and need memory space in addition to input and output tensors can specify the additional memory requirements with the IPluginV3OneBuild::getWorkspaceSize method, which the builder calls to determine and preallocate scratch space.
The layer is configured, executed, and destroyed at build time to discover optimal configurations. After selecting the optimal configuration for a plugin, the chosen tactic and concrete shape/format information (except for data-dependent dimensions) are communicated to the plugin during inference. It is executed as many times as needed for the lifetime of the inference application and finally destroyed when the engine is destroyed.
The builder controls these steps and runtime using the following plugin methods. Methods also called during inference are indicated by (*) - all others are only called by the builder.
IPluginV3OneBuild::attachToContext*: This function requests that a plugin clone be attached to anExecutionContext, allowing the plugin to access any context-specific resources.IPluginV3OneBuild::getTimingCacheId: This function queries for any timing cached ID that TensorRT can use. If provided, it enables timing caching (it is disabled by default).IPluginV3OneBuild::getNbTactics: Used to query for the number of custom tactics the plugin chooses to use.IPluginV3OneBuild::getValidTactics: This function queries for any custom tactics the plugin can use. The plugin will be profiled for each tactic up to a maximum indicated byIPluginV3OneBuild::getFormatCombinationLimit().IPluginV3OneBuild::getFormatCombinationLimit: This function queries the maximum number of format combinations that can be timed for each tactic (0if no custom tactics are advertised for the default tactic).IPluginV3OneRuntime::setTactic*: Communicates the tactic to be used during the subsequentenqueue(). If no custom tactics were advertised, this would always be0.IPluginV3OneBuild::configurePlugin: Communicates the number of inputs and outputs and their shapes, data types, and formats. Themin,opt, andmaxof each input or output’sDynamicPluginTensorDesccorrespond to thekMIN,kOPT, andkMAXvalues of the optimization profile that the plugin is currently profiled for. Thedesc.dimsfield corresponds to the dimensions of plugin inputs specified at network creation. Wildcard dimensions can exist during this phase in thedesc.dimsfield. At this point, the plugin can set up its internal state and select the most appropriate algorithm and data structures for the given configuration.IPluginV3OneRuntime::onShapeChange*: Communicates the number of inputs and outputs and their shapes, data types, and formats. The dimensions are concrete, except if data-dependent dimensions exist, which wildcards will indicate.IPluginV3OneRuntime::enqueue*: Encapsulates the actual algorithm and kernel calls of the plugin and provides pointers to input, output, and scratch space, as well as the CUDA stream to be used for kernel execution.IPluginV3::clone: This is called every time a new builder, network, or engine is created that includes this plugin layer. It must return a new plugin object with the correct parameters.
After the builder completes profiling, before the engine is serialized, IPluginV3OneRuntime::getFieldsToSerialize is called to query for any plugin fields that must be serialized into the engine. These are expected to be data that the plugin needs to function properly during the inference stage after the engine has been deserialized.
IPluginCreatorV3One API Description#
The following methods in the IPluginCreatorV3One class are used to find and create the appropriate plugin from the plugin registry:
getPluginName: This returns the plugin name and should match the return value ofIPluginV3OneCore::getPluginName.getPluginVersion: Returns the plugin version. For all internal TensorRT plugins, this defaults to1.getPluginNamespace: Returns the plugin namespace. The default can be"".getFieldNames: To successfully create a plugin, you must know all the plugin’s field parameters. This method returns thePluginFieldCollectionstruct with thePluginFieldentries populated to reflect the field name andPluginFieldType(the data should point tonullptr).createPlugin: This method creates a plugin, passing aPluginFieldCollectionand aTensorRTPhaseargument.
During engine deserialization, TensorRT calls this method with the TensorRTPhase argument set to TensorRTPhase::kRUNTIME and the PluginFieldCollection populated with the same PluginFields as in the one returned by IPluginV3OneRuntime::getFieldsToSerialize(). In this case, TensorRT takes ownership of plugin objects returned by createPlugin.
You can also invoke createPlugin to produce plugin objects to add to a TensorRT network. In this case, setting the phase argument to TensorRTPhase::kBUILD is recommended. The data passed with the PluginFieldCollection should be allocated and freed by the caller before the program is destroyed. The ownership of the plugin object returned by the createPlugin function is passed to the caller and must be destroyed.
Migrating V2 Plugins to IPluginV3#
IPluginV2 and IPluginV2Ext have been deprecated since TensorRT 8.5, and IPluginV2IOExt and IPluginV2DynamicExt are deprecated in TensorRT 10.0. Therefore, new plugins should target IPluginV3, and old ones should be refactored.
Key migration points from IPluginV2DynamicExt to IPluginV3
Keep in mind the following key points when migrating an IPluginV2DynamicExt plugin to IPluginV3:
The plugin creator associated with the plugin must be migrated to
IPluginCreatorV3One, the factory class forIPluginV3(IPluginCreatoris the factory class forIPluginV2derivatives). This simply consists of migratingIPluginCreator::deserializePlugin. For more information, refer to the Plugin Serialization and Deserialization section.There is no equivalent to
IPluginV2::initialize(),IPluginV2::terminate(), andIPluginV2::destroy()inIPluginV3. For more information, refer to the Plugin Initialization and Termination section.There is no equivalent to
IPluginV2Ext::detachFromContext()inIPluginV3. For more information, refer to the Accessing Context-Specific Resources Provided by TensorRT section.IPluginV3OneRuntime::attachToContext()is markedly different fromIPluginV2Ext::attachToContext()regarding arguments and behavior. For more information, refer to the Accessing Context-Specific Resources Provided by TensorRT section.In
IPluginV3, plugin serialization is through aPluginFieldCollectionthat gets passed to TensorRT byIPluginV3OneRuntime::getFieldsToSerialize()and deserialization is through the samePluginFieldCollectionthat gets passed back by TensorRT toIPluginCreatorV3One::createPlugin(...). For more information, refer to the Plugin Serialization and Deserialization section.The
IPluginV3equivalents of void return methods inIPluginV2DynamicExtwill expect an integer status code as a return value (such asconfigurePlugin).supportsFormatCombinationandgetWorkspaceSizeget dynamic tensor descriptors (DynamicPluginTensorDesc) instead of static descriptors (PluginTensorDesc).IPluginV2DynamicExt::getOutputDimensions()becomesIPluginV3OneBuild::getOutputShapes()and changes to an output parameter signature instead of a return value. It also shifts from per-output index querying to one-shot querying. A similar transition applies fromIPluginV2Ext::getOutputDataTypetoIPluginV3OneBuild::getOutputDataTypes.
Plugin Initialization and Termination
IPluginV2 provided several APIs for plugin initialization and termination: namely, IPluginV2::initialize(), IPluginV2::terminate(), and IPluginV2::destroy(). In IPluginV3, plugins are expected to be constructed in an initialized state; if your V2 plugin had any lazy initialization in initialize, it can be deferred to onShapeChange or configurePlugin. Any resource release or termination logic in IPluginV2::terminate() or IPluginV2::destroy() can be moved to the class destructor. The exception is in the Python API; IPluginV3.destroy() is provided as an alternative for a C++-like destructor.
Accessing Context-Specific Resources Provided by TensorRT
IPluginV2Ext::attachToContext() provided plugins access to context-specific resources, namely the GPU allocator and cuDNN and cuBLAS handles. IPluginV3OneRuntime::attachToContext() is meant to provide a similar service to plugins, but it instead provides an IPluginResourceContext, which in turn exposes resources that plugins can request.
In a departure from IPluginV2Ext::attachToContext(), cuDNN and cuBLAS handles are no longer provided by IPluginResourceContext; any plugins that depended on those should migrate to initialize their own cuDNN and cuBLAS resources. If sharing cuDNN/cuBLAS resources among plugins is preferred, you can utilize the functionality provided by IPluginResource and the plugin registry’s key-value store to accomplish this. For more information, refer to the Sharing Custom Resources Among Plugins section.
IPluginV3OneRuntime::attachToContext(...) is a clone-and-attach operation. It is asked to clone the entire IPluginV3 object—not just the runtime capability. Therefore, if implemented as a separate class, the runtime capability object can need to hold a reference to the IPluginV3 object of which it is a part.
Any context-specific resource obtained through IPluginResourceContext can be used until the plugin is destroyed. Therefore, any termination logic implemented in IPluginV2Ext::detachFromContext() can be moved to the plugin destructor.
Plugin Serialization and Deserialization
For V2 plugins, serialization and deserialization were determined by the implementation of IPluginV2::serialize, IPluginV2::getSerializationSize, and IPluginCreator::deserializePlugin; IPluginV3OneRuntime::getFieldsToSerialize and IPluginCreatorV3One::createPlugin have replaced these. Note that the workflow has shifted from writing to/reading from a raw buffer to constructing and parsing a PluginFieldCollection.
TensorRT handles the serialization of types defined in PluginFieldType. Custom types can be serialized as PluginFieldType::kUNKNOWN. For example:
struct DummyStruct
{
int32_t a;
float b;
};
DummyPlugin()
{
// std::vector<nvinfer1::PluginField> mDataToSerialize;
// int32_t mIntValue;
// std::vector<float> mFloatVector;
// DummyStruct mDummyStruct;
mDataToSerialize.clear();
mDataToSerialize.emplace_back(PluginField("intScalar", &mIntValue, PluginFieldType::kINT32, 1));
mDataToSerialize.emplace_back(PluginField("floatVector", mFloatVector.data(), PluginFieldType::kFLOAT32, mFloatVector.size()));
mDataToSerialize.emplace_back(PluginField("dummyStruct", &mDummyStruct, PluginFieldType::kUNKNOWN, sizeof(DummyStruct)));
mFCToSerialize.nbFields = mDataToSerialize.size();
mFCToSerialize.fields = mDataToSerialize.data();
}
nvinfer1::PluginFieldCollection const* DummyPlugin::getFieldsToSerialize() noexcept override
{
return &mFCToSerialize;
}
Migrating Older V2 Plugins to IPluginV3
If migrating from IPluginV2 or IPluginV2Ext to IPluginV3, it is easier to migrate first to IPluginV2DynamicExt and then follow the guidelines above to migrate to IPluginV3. The new features in IPluginV2DynamicExt are as follows:
virtual DimsExprs getOutputDimensions(int outputIndex, const DimsExprs* inputs, int nbInputs, IExprBuilder& exprBuilder) = 0;
virtual bool supportsFormatCombination(int pos, const PluginTensorDesc* inOut, int nbInputs, int nbOutputs) = 0;
virtual void configurePlugin(const DynamicPluginTensorDesc* in, int nbInputs, const DynamicPluginTensorDesc* out, int nbOutputs) = 0;
virtual size_t getWorkspaceSize(const PluginTensorDesc* inputs, int nbInputs, const PluginTensorDesc* outputs, int nbOutputs) const = 0;
virtual int enqueue(const PluginTensorDesc* inputDesc, const PluginTensorDesc* outputDesc, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) = 0;
Guidelines for migration to IPluginV2DynamicExt are:
getOutputDimensionsimplements the expression for output tensor dimensions given the inputs.supportsFormatCombinationchecks if the plugin supports the format and datatype for the specified I/O.configurePluginmimics the behavior of equivalentconfigurePlugininIPluginV2Extbut accepts tensor descriptors.getWorkspaceSizeand enqueue mimic the behavior of equivalent APIs inIPluginV2Extbut accept tensor descriptors.
Coding Guidelines for Plugins#
Memory Allocation
Memory allocated in the plugin must be freed to ensure no memory leak. If resources are acquired in the plugin constructor or at a later stage, like onShapeChange, they must be released, possibly in the plugin class destructor.
Another option is to request any additional workspace memory required through getWorkspaceSize, which will be available during enqueue.
Add Checks to Ensure Proper Configuration and Validate Inputs
A common source for unexpected plugin behavior is improper configuration (such as invalid plugin attributes) and invalid inputs. As such, it is good practice to add checks/assertions during the initial plugin development for cases where the plugin is not expected to work. The following are places where checks could be added:
createPlugin: Plugin attributes checksconfigurePluginoronShapeChange: Input dimension checksenqueue: Input value checks
Return Null at Errors for Methods That Create a New Plugin Object
Methods like createPlugin, clone, and attachToContext can be expected to create and return new plugin objects. In these methods, ensure a null object (nullptr in C++) is returned in case of any error or failed check. This ensures that non-null plugin objects are not returned when configured incorrectly.
Avoid Device Memory Allocations in clone()
Since the builder calls clone multiple times, device memory allocations could be significantly expensive. One option is to do persistent memory allocations in the constructor, copy to a device when the plugin is ready (such as in configurePlugin), and release during destruction.
Serializing Arbitrary Pieces of Data and Custom Types
Plugin authors can utilize PluginField of PluginFieldType::kUNKNOWN to indicate arbitrary pieces of data to be serialized. In this case, the length of the respective PluginField should be the number of bytes corresponding to the buffer pointed to by data. The serialization of non-primitive types can be achieved in this way.