Plugin API Description#

All new plugins should derive from both IPluginCreatorV3One and IPluginV3 classes. In addition, new plugins should also be registered in the plugin registry, either dynamically by using IPluginRegistry::registerCreator() or statically using the REGISTER_TENSORRT_PLUGIN(...) macro. Custom plugin libraries can also consider implementing an init function equivalent to initLibNvInferPlugins() to perform bulk registration.

Note

Automotive safety users must use the REGISTER_SAFE_TENSORRT_PLUGIN(...) macro instead of REGISTER_TENSORRT_PLUGIN(...). Refer to the NVIDIA TensorRT Safety Production Guide for DriveOS for any safety-related activities.

IPluginV3 API Description#

The following section describes the functions of IPluginV3 and, by extension, IPluginV3OneCore, IPluginV3OneBuild or IPluginV3OneBuildV2, and IPluginV3OneRuntime.

Since an IPluginV3 object consists of different capabilities, IPluginV3::getCapabilityInterface can be called anytime during its lifetime. An IPluginV3 object added for the build phase must return a valid capability interface for all capability types: core, build, and runtime. The build capability can be omitted for objects added for the runtime phase.

There are a few methods used to request identifying information about the plugin. They can also be called during any stage of the plugin’s lifetime.

  • IPluginV3OneCore::getPluginName: Used to query for the plugin’s name

  • IPluginV3OneCore::getPluginVersion: Used to query for the plugin’s version

  • IPluginV3OneCore::getPluginNamespace: Used to query for the plugin’s namespace

  • IPluginV3OneBuild::getMetadataString: Used to query for a string representation of any metadata associated with the plugin, such as the values of its attributes.

To connect a plugin layer to neighboring layers and set up input and output data structures, the builder checks for the number of outputs and their shapes by calling the following plugin methods:

  • IPluginV3OneBuild::getNbOutputs: Used to specify the number of output tensors.

  • IPluginV3OneBuild::getOutputShapes: This function specifies the output shapes as a function of the input shapes or constants. The exception is data-dependent shapes with a specified upper bound and optimal tuning value.

  • IPluginV3OneBuild::supportsFormatCombination: Used to check if a plugin supports a given data type and format combination.

  • IPluginV3OneBuild::getOutputDataType: This function retrieves the data types of the output tensors. The returned data types must be in a format supported by the plugin.

If the IPluginV3OneBuildV2 build capability is used, the plugin can also communicate to TensorRT that certain input-output pairs are aliased (share the same data buffer). TensorRT will query IPluginV3OneBuildV2::getAliasedInput to determine any such aliasing behavior. To use this feature, PreviewFeature::kALIASED_PLUGIN_IO_10_03 must be enabled.

Plugin layers can support the following data formats:

  • LINEAR single-precision (FP32), half-precision (FP16), brain floating-point (BF16), 8-bit floating-point E4M3 (FP8), integer (INT8), and integer (INT32) tensors

  • CHW32 single-precision (FP32) and integer (INT8) tensors.

  • CHW2, HWC8, HWC16, and DHWC8 half-precision (FP16) tensors.

  • CHW4 half-precision (FP16), and integer (INT8) tensors.

  • HWC8, HWC4, NDHWC8, NC2HW brain floating-point (BF16) tensors.

PluginFormat counts the formats.

Plugins that do not compute all data in place and need memory space in addition to input and output tensors can specify the additional memory requirements with the IPluginV3OneBuild::getWorkspaceSize method, which the builder calls to determine and preallocate scratch space.

The layer is configured, executed, and destroyed at build time to discover optimal configurations. After selecting the optimal configuration for a plugin, the chosen tactic and concrete shape/format information (except for data-dependent dimensions) are communicated to the plugin during inference. It is executed as many times as needed for the lifetime of the inference application and finally destroyed when the engine is destroyed.

The builder controls these steps and runtime using the following plugin methods. Methods also called during inference are indicated by (*) - all others are only called by the builder.

  • IPluginV3OneBuild::attachToContext*: This function requests that a plugin clone be attached to an ExecutionContext, allowing the plugin to access any context-specific resources.

  • IPluginV3OneBuild::getTimingCacheId: This function queries for any timing cached ID that TensorRT can use. If provided, it enables timing caching (it is disabled by default).

  • IPluginV3OneBuild::getNbTactics: Used to query for the number of custom tactics the plugin chooses to use.

  • IPluginV3OneBuild::getValidTactics: This function queries for any custom tactics the plugin can use. The plugin will be profiled for each tactic up to a maximum indicated by IPluginV3OneBuild::getFormatCombinationLimit().

  • IPluginV3OneBuild::getFormatCombinationLimit: This function queries the maximum number of format combinations that can be timed for each tactic (0 if no custom tactics are advertised for the default tactic).

  • IPluginV3OneRuntime::setTactic*: Communicates the tactic to be used during the subsequent enqueue(). If no custom tactics were advertised, this would always be 0.

  • IPluginV3OneBuild::configurePlugin: Communicates the number of inputs and outputs and their shapes, data types, and formats. The min, opt, and max of each input or output’s DynamicPluginTensorDesc correspond to the kMIN, kOPT, and kMAX values of the optimization profile that the plugin is currently profiled for. The desc.dims field corresponds to the dimensions of plugin inputs specified at network creation. Wildcard dimensions can exist during this phase in the desc.dims field. At this point, the plugin can set up its internal state and select the most appropriate algorithm and data structures for the given configuration.

  • IPluginV3OneRuntime::onShapeChange*: Communicates the number of inputs and outputs and their shapes, data types, and formats. The dimensions are concrete, except if data-dependent dimensions exist, which wildcards will indicate.

  • IPluginV3OneRuntime::enqueue*: Encapsulates the actual algorithm and kernel calls of the plugin and provides pointers to input, output, and scratch space, as well as the CUDA stream to be used for kernel execution.

  • IPluginV3::clone: This is called every time a new builder, network, or engine is created that includes this plugin layer. It must return a new plugin object with the correct parameters.

After the builder completes profiling, before the engine is serialized, IPluginV3OneRuntime::getFieldsToSerialize is called to query for any plugin fields that must be serialized into the engine. These are expected to be data that the plugin needs to function properly during the inference stage after the engine has been deserialized.

IPluginCreatorV3One API Description#

The following methods in the IPluginCreatorV3One class are used to find and create the appropriate plugin from the plugin registry:

  • getPluginName: This returns the plugin name and should match the return value of IPluginV3OneCore::getPluginName.

  • getPluginVersion: Returns the plugin version. For all internal TensorRT plugins, this defaults to 1.

  • getPluginNamespace: Returns the plugin namespace. The default can be "".

  • getFieldNames: To successfully create a plugin, you must know all the plugin’s field parameters. This method returns the PluginFieldCollection struct with the PluginField entries populated to reflect the field name and PluginFieldType (the data should point to nullptr).

  • createPlugin: This method creates a plugin, passing a PluginFieldCollection and a TensorRTPhase argument.

During engine deserialization, TensorRT calls this method with the TensorRTPhase argument set to TensorRTPhase::kRUNTIME and the PluginFieldCollection populated with the same PluginFields as in the one returned by IPluginV3OneRuntime::getFieldsToSerialize(). In this case, TensorRT takes ownership of plugin objects returned by createPlugin.

You can also invoke createPlugin to produce plugin objects to add to a TensorRT network. In this case, setting the phase argument to TensorRTPhase::kBUILD is recommended. The data passed with the PluginFieldCollection should be allocated and freed by the caller before the program is destroyed. The ownership of the plugin object returned by the createPlugin function is passed to the caller and must be destroyed.

Migrating V2 Plugins to IPluginV3#

IPluginV2 and IPluginV2Ext have been deprecated since TensorRT 8.5, and IPluginV2IOExt and IPluginV2DynamicExt are deprecated in TensorRT 10.0. Therefore, new plugins should target IPluginV3, and old ones should be refactored.

Key migration points from IPluginV2DynamicExt to IPluginV3

Keep in mind the following key points when migrating an IPluginV2DynamicExt plugin to IPluginV3:

  • The plugin creator associated with the plugin must be migrated to IPluginCreatorV3One, the factory class for IPluginV3 (IPluginCreator is the factory class for IPluginV2 derivatives). This simply consists of migrating IPluginCreator::deserializePlugin. For more information, refer to the Plugin Serialization and Deserialization section.

  • There is no equivalent to IPluginV2::initialize(), IPluginV2::terminate(), and IPluginV2::destroy() in IPluginV3. For more information, refer to the Plugin Initialization and Termination section.

  • There is no equivalent to IPluginV2Ext::detachFromContext() in IPluginV3. For more information, refer to the Accessing Context-Specific Resources Provided by TensorRT section.

  • IPluginV3OneRuntime::attachToContext() is markedly different from IPluginV2Ext::attachToContext() regarding arguments and behavior. For more information, refer to the Accessing Context-Specific Resources Provided by TensorRT section.

  • In IPluginV3, plugin serialization is through a PluginFieldCollection that gets passed to TensorRT by IPluginV3OneRuntime::getFieldsToSerialize() and deserialization is through the same PluginFieldCollection that gets passed back by TensorRT to IPluginCreatorV3One::createPlugin(...). For more information, refer to the Plugin Serialization and Deserialization section.

  • The IPluginV3 equivalents of void return methods in IPluginV2DynamicExt will expect an integer status code as a return value (such as configurePlugin).

  • supportsFormatCombination and getWorkspaceSize get dynamic tensor descriptors (DynamicPluginTensorDesc) instead of static descriptors (PluginTensorDesc).

  • IPluginV2DynamicExt::getOutputDimensions() becomes IPluginV3OneBuild::getOutputShapes() and changes to an output parameter signature instead of a return value. It also shifts from per-output index querying to one-shot querying. A similar transition applies from IPluginV2Ext::getOutputDataType to IPluginV3OneBuild::getOutputDataTypes.

Plugin Initialization and Termination

IPluginV2 provided several APIs for plugin initialization and termination: namely, IPluginV2::initialize(), IPluginV2::terminate(), and IPluginV2::destroy(). In IPluginV3, plugins are expected to be constructed in an initialized state; if your V2 plugin had any lazy initialization in initialize, it can be deferred to onShapeChange or configurePlugin. Any resource release or termination logic in IPluginV2::terminate() or IPluginV2::destroy() can be moved to the class destructor. The exception is in the Python API; IPluginV3.destroy() is provided as an alternative for a C++-like destructor.

Accessing Context-Specific Resources Provided by TensorRT

IPluginV2Ext::attachToContext() provided plugins access to context-specific resources, namely the GPU allocator and cuDNN and cuBLAS handles. IPluginV3OneRuntime::attachToContext() is meant to provide a similar service to plugins, but it instead provides an IPluginResourceContext, which in turn exposes resources that plugins can request.

In a departure from IPluginV2Ext::attachToContext(), cuDNN and cuBLAS handles are no longer provided by IPluginResourceContext; any plugins that depended on those should migrate to initialize their own cuDNN and cuBLAS resources. If sharing cuDNN/cuBLAS resources among plugins is preferred, you can utilize the functionality provided by IPluginResource and the plugin registry’s key-value store to accomplish this. For more information, refer to the Sharing Custom Resources Among Plugins section.

IPluginV3OneRuntime::attachToContext(...) is a clone-and-attach operation. It is asked to clone the entire IPluginV3 object—not just the runtime capability. Therefore, if implemented as a separate class, the runtime capability object can need to hold a reference to the IPluginV3 object of which it is a part.

Any context-specific resource obtained through IPluginResourceContext can be used until the plugin is destroyed. Therefore, any termination logic implemented in IPluginV2Ext::detachFromContext() can be moved to the plugin destructor.

Plugin Serialization and Deserialization

For V2 plugins, serialization and deserialization were determined by the implementation of IPluginV2::serialize, IPluginV2::getSerializationSize, and IPluginCreator::deserializePlugin; IPluginV3OneRuntime::getFieldsToSerialize and IPluginCreatorV3One::createPlugin have replaced these. Note that the workflow has shifted from writing to/reading from a raw buffer to constructing and parsing a PluginFieldCollection.

TensorRT handles the serialization of types defined in PluginFieldType. Custom types can be serialized as PluginFieldType::kUNKNOWN. For example:

struct DummyStruct
{
    int32_t a;
    float b;
};

DummyPlugin()
{
    // std::vector<nvinfer1::PluginField> mDataToSerialize;
    // int32_t mIntValue;
    // std::vector<float> mFloatVector;
    // DummyStruct mDummyStruct;
    mDataToSerialize.clear();
    mDataToSerialize.emplace_back(PluginField("intScalar", &mIntValue, PluginFieldType::kINT32, 1));
    mDataToSerialize.emplace_back(PluginField("floatVector", mFloatVector.data(), PluginFieldType::kFLOAT32, mFloatVector.size()));
    mDataToSerialize.emplace_back(PluginField("dummyStruct", &mDummyStruct, PluginFieldType::kUNKNOWN, sizeof(DummyStruct)));
    mFCToSerialize.nbFields = mDataToSerialize.size();
    mFCToSerialize.fields = mDataToSerialize.data();
}

nvinfer1::PluginFieldCollection const* DummyPlugin::getFieldsToSerialize() noexcept override
{
    return &mFCToSerialize;
}
Migrating Older V2 Plugins to IPluginV3

If migrating from IPluginV2 or IPluginV2Ext to IPluginV3, it is easier to migrate first to IPluginV2DynamicExt and then follow the guidelines above to migrate to IPluginV3. The new features in IPluginV2DynamicExt are as follows:

virtual DimsExprs getOutputDimensions(int outputIndex, const DimsExprs* inputs, int nbInputs, IExprBuilder& exprBuilder) = 0;

virtual bool supportsFormatCombination(int pos, const PluginTensorDesc* inOut, int nbInputs, int nbOutputs) = 0;

virtual void configurePlugin(const DynamicPluginTensorDesc* in, int nbInputs, const DynamicPluginTensorDesc* out, int nbOutputs) = 0;

virtual size_t getWorkspaceSize(const PluginTensorDesc* inputs, int nbInputs, const PluginTensorDesc* outputs, int nbOutputs) const = 0;

virtual int enqueue(const PluginTensorDesc* inputDesc, const PluginTensorDesc* outputDesc, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) = 0;

Guidelines for migration to IPluginV2DynamicExt are:

  • getOutputDimensions implements the expression for output tensor dimensions given the inputs.

  • supportsFormatCombination checks if the plugin supports the format and datatype for the specified I/O.

  • configurePlugin mimics the behavior of equivalent configurePlugin in IPluginV2Ext but accepts tensor descriptors.

  • getWorkspaceSize and enqueue mimic the behavior of equivalent APIs in IPluginV2Ext but accept tensor descriptors.

Coding Guidelines for Plugins#

Memory Allocation

Memory allocated in the plugin must be freed to ensure no memory leak. If resources are acquired in the plugin constructor or at a later stage, like onShapeChange, they must be released, possibly in the plugin class destructor.

Another option is to request any additional workspace memory required through getWorkspaceSize, which will be available during enqueue.

Add Checks to Ensure Proper Configuration and Validate Inputs

A common source for unexpected plugin behavior is improper configuration (such as invalid plugin attributes) and invalid inputs. As such, it is good practice to add checks/assertions during the initial plugin development for cases where the plugin is not expected to work. The following are places where checks could be added:

  • createPlugin: Plugin attributes checks

  • configurePlugin or onShapeChange: Input dimension checks

  • enqueue: Input value checks

Return Null at Errors for Methods That Create a New Plugin Object

Methods like createPlugin, clone, and attachToContext can be expected to create and return new plugin objects. In these methods, ensure a null object (nullptr in C++) is returned in case of any error or failed check. This ensures that non-null plugin objects are not returned when configured incorrectly.

Avoid Device Memory Allocations in clone()

Since the builder calls clone multiple times, device memory allocations could be significantly expensive. One option is to do persistent memory allocations in the constructor, copy to a device when the plugin is ready (such as in configurePlugin), and release during destruction.

Serializing Arbitrary Pieces of Data and Custom Types

Plugin authors can utilize PluginField of PluginFieldType::kUNKNOWN to indicate arbitrary pieces of data to be serialized. In this case, the length of the respective PluginField should be the number of bytes corresponding to the buffer pointed to by data. The serialization of non-primitive types can be achieved in this way.

Plugin Shared Libraries#

TensorRT contains built-in plugins that can be loaded statically into your application.

You can explicitly register custom plugins with TensorRT using the REGISTER_TENSORRT_PLUGIN and registerCreator interfaces (refer to Adding Custom Layers). However, you may want TensorRT to manage the registration of a plugin library and, in particular, serialize plugin libraries with the plan file so they are automatically loaded when the engine is created. This can be especially useful when you want to include the plugins in a version-compatible engine so that you do not need to manage them after building the engine. To take advantage of this, you can build shared libraries with specific entry points recognized by TensorRT.

Generating Plugin Shared Libraries#

To create a shared library for plugins, the library must have the following public symbols defined:

extern "C" void setLoggerFinder(ILoggerFinder* finder);
extern "C" IPluginCreator* const* getCreators(int32_t& nbCreators) const;

extern "C" above is only used to prevent name mangling, and the methods should be implemented in C++. Consult your compiler’s ABI documentation for more details.

setLoggerFinder() should set a global pointer of ILoggerFinder in the library for logging in the plugin code. getPluginCreators() returns a list of plugin creators your library contains. An example of these entry points can be found in plugin/common/vfcCommon.h/cpp.

To serialize your plugin libraries with your engine plan, provide the plugin libraries paths to TensorRT using setPluginsToSerialize() in BuilderConfig.

You can also package plugins in the plan when building version-compatible engines. The packaged plugins will have the same lifetime as the engine and will be automatically registered/deregistered when running the engine.

Using Plugin Shared Libraries#

After building your shared libraries, you can configure the builder to serialize them with the engine. Next time you load the engine into TensorRT, the serialized plugin libraries will be loaded and registered automatically.

Note

IPluginRegistry loadLibrary() (C++, Python) functionality now supports plugin-shared libraries containing both V2 and V3 plugin creators through the getCreators() entry point. The getPluginCreators() entry point is valid, too, but is deprecated. TensorRT first checks if the getCreators() symbol is available, and if not, checks for getPluginCreators() as a fallback for backward compatibility. You can then query this to enumerate each plugin creator and register it manually using IPluginRegistry registerCreator() (C++, Python).

Load the plugins for use with the builder before building the engine:

1for (size_t i = 0; i < nbPluginLibs; ++i)
2{
3    builder->getPluginRegistry().loadLibrary(pluginLibs[i]);
4}
1for plugin_lib in plugin_libs:
2    builder.get_plugin_registry().load_library(plugin_lib)

Next, decide if the plugins should be included with the engine or shipped externally. You can serialize the plugins with the plan as follows:

1IBuilderConfig *config = builder->createBuilderConfig();
2...
3config->setPluginsToSerialize(pluginLibs, nbPluginLibs);
1config = builder.create_builder_config()
2...
3config.plugins_to_serialize = plugin_libs

Alternatively, you can keep the plugins external to the engine. You will need to ship these libraries along with the engine when it is deployed and load them explicitly in the runtime before deserializing the engine:

1// In this example, getExternalPluginLibs() is a user-implemented method that retrieves the list of libraries to use with the engine
2std::vector<std::string> pluginLibs = getExternalPluginLibs();
3for (auto const &pluginLib : pluginLibs)
4{
5    runtime->getPluginRegistry().loadLibrary(pluginLib.c_str())
6}
1# In this example, get_external_plugin_libs() is a user-implemented method that retrieves the list of libraries to use with the engine
2plugin_libs = get_external_plugin_libs()
3for plugin_lib in plugin_libs:
4    runtime.get_plugin_registry().load_library(plugin_lib)