Is this page helpful?

Plugin API Description#

All new plugins should derive from both IPluginCreatorV3One and IPluginV3 classes. In addition, new plugins should also be registered in the plugin registry, either dynamically by using IPluginRegistry::registerCreator() or statically using the REGISTER_TENSORRT_PLUGIN(...) macro. Custom plugin libraries can also consider implementing an init function equivalent to initLibNvInferPlugins() to perform bulk registration.

Note

Automotive safety users must use the REGISTER_SAFE_TENSORRT_PLUGIN(...) macro instead of REGISTER_TENSORRT_PLUGIN(...). Refer to the NVIDIA TensorRT Safety Production Guide for DriveOS for any safety-related activities.

`IPluginV3` API Description#

The following section describes the functions of IPluginV3 and, by extension, IPluginV3OneCore, IPluginV3OneBuild or IPluginV3OneBuildV2, and IPluginV3OneRuntime.

Since an IPluginV3 object consists of different capabilities, IPluginV3::getCapabilityInterface can be called anytime during its lifetime. An IPluginV3 object added for the build phase must return a valid capability interface for all capability types: core, build, and runtime. The build capability can be omitted for objects added for the runtime phase.

There are a few methods used to request identifying information about the plugin. They can also be called during any stage of the plugin’s lifetime.

IPluginV3OneCore::getPluginName: Used to query for the plugin’s name
IPluginV3OneCore::getPluginVersion: Used to query for the plugin’s version
IPluginV3OneCore::getPluginNamespace: Used to query for the plugin’s namespace
IPluginV3OneBuild::getMetadataString: Used to query for a string representation of any metadata associated with the plugin, such as the values of its attributes.

To connect a plugin layer to neighboring layers and set up input and output data structures, the builder checks for the number of outputs and their shapes by calling the following plugin methods:

IPluginV3OneBuild::getNbOutputs: Used to specify the number of output tensors.
IPluginV3OneBuild::getOutputShapes: This function specifies the output shapes as a function of the input shapes or constants. The exception is data-dependent shapes with a specified upper bound and optimal tuning value.
IPluginV3OneBuild::supportsFormatCombination: Used to check if a plugin supports a given data type and format combination.
IPluginV3OneBuild::getOutputDataType: This function retrieves the data types of the output tensors. The returned data types must be in a format supported by the plugin.

If the IPluginV3OneBuildV2 build capability is used, the plugin can also communicate to TensorRT that certain input-output pairs are aliased (share the same data buffer). TensorRT will query IPluginV3OneBuildV2::getAliasedInput to determine any such aliasing behavior. To use this feature, PreviewFeature::kALIASED_PLUGIN_IO_10_03 must be enabled.

Plugin layers can support the following data formats:

LINEAR single-precision (FP32), half-precision (FP16), brain floating-point (BF16), 8-bit floating-point E4M3 (FP8), integer (INT8), and integer (INT32) tensors
CHW32 single-precision (FP32) and integer (INT8) tensors.
CHW2, HWC8, HWC16, and DHWC8 half-precision (FP16) tensors.
CHW4 half-precision (FP16), and integer (INT8) tensors.
HWC8, HWC4, NDHWC8, NC2HW brain floating-point (BF16) tensors.

PluginFormat counts the formats.

Plugins that do not compute all data in place and need memory space in addition to input and output tensors can specify the additional memory requirements with the IPluginV3OneBuild::getWorkspaceSize method, which the builder calls to determine and preallocate scratch space.

The layer is configured, executed, and destroyed at build time to discover optimal configurations. After selecting the optimal configuration for a plugin, the chosen tactic and concrete shape/format information (except for data-dependent dimensions) are communicated to the plugin during inference. It is executed as many times as needed for the lifetime of the inference application and finally destroyed when the engine is destroyed.

The builder controls these steps and runtime using the following plugin methods. Methods also called during inference are indicated by (*) - all others are only called by the builder.

IPluginV3OneBuild::attachToContext*: This function requests that a plugin clone be attached to an ExecutionContext, allowing the plugin to access any context-specific resources.
IPluginV3OneBuild::getTimingCacheId: This function queries for any timing cached ID that TensorRT can use. If provided, it enables timing caching (it is disabled by default).
IPluginV3OneBuild::getNbTactics: Used to query for the number of custom tactics the plugin chooses to use.
IPluginV3OneBuild::getValidTactics: This function queries for any custom tactics the plugin can use. The plugin will be profiled for each tactic up to a maximum indicated by IPluginV3OneBuild::getFormatCombinationLimit().
IPluginV3OneBuild::getFormatCombinationLimit: This function queries the maximum number of format combinations that can be timed for each tactic (0 if no custom tactics are advertised for the default tactic).
IPluginV3OneRuntime::setTactic*: Communicates the tactic to be used during the subsequent enqueue(). If no custom tactics were advertised, this would always be 0.
IPluginV3OneBuild::configurePlugin: Communicates the number of inputs and outputs and their shapes, data types, and formats. The min, opt, and max of each input or output’s DynamicPluginTensorDesc correspond to the kMIN, kOPT, and kMAX values of the optimization profile that the plugin is currently profiled for. The desc.dims field corresponds to the dimensions of plugin inputs specified at network creation. Wildcard dimensions can exist during this phase in the desc.dims field. At this point, the plugin can set up its internal state and select the most appropriate algorithm and data structures for the given configuration.
IPluginV3OneRuntime::onShapeChange*: Communicates the number of inputs and outputs and their shapes, data types, and formats. The dimensions are concrete, except if data-dependent dimensions exist, which wildcards will indicate.
IPluginV3OneRuntime::enqueue*: Encapsulates the actual algorithm and kernel calls of the plugin and provides pointers to input, output, and scratch space, as well as the CUDA stream to be used for kernel execution.
IPluginV3::clone: This is called every time a new builder, network, or engine is created that includes this plugin layer. It must return a new plugin object with the correct parameters.

After the builder completes profiling, before the engine is serialized, IPluginV3OneRuntime::getFieldsToSerialize is called to query for any plugin fields that must be serialized into the engine. These are expected to be data that the plugin needs to function properly during the inference stage after the engine has been deserialized.

`IPluginCreatorV3One` API Description#

The following methods in the IPluginCreatorV3One class are used to find and create the appropriate plugin from the plugin registry:

getPluginName: This returns the plugin name and should match the return value of IPluginV3OneCore::getPluginName.
getPluginVersion: Returns the plugin version. For all internal TensorRT plugins, this defaults to 1.
getPluginNamespace: Returns the plugin namespace. The default can be "".
getFieldNames: To successfully create a plugin, you must know all the plugin’s field parameters. This method returns the PluginFieldCollection struct with the PluginField entries populated to reflect the field name and PluginFieldType (the data should point to nullptr).
createPlugin: This method creates a plugin, passing a PluginFieldCollection and a TensorRTPhase argument.

During engine deserialization, TensorRT calls this method with the TensorRTPhase argument set to TensorRTPhase::kRUNTIME and the PluginFieldCollection populated with the same PluginFields as in the one returned by IPluginV3OneRuntime::getFieldsToSerialize(). In this case, TensorRT takes ownership of plugin objects returned by createPlugin.

You can also invoke createPlugin to produce plugin objects to add to a TensorRT network. In this case, setting the phase argument to TensorRTPhase::kBUILD is recommended. The data passed with the PluginFieldCollection should be allocated and freed by the caller before the program is destroyed. The ownership of the plugin object returned by the createPlugin function is passed to the caller and must be destroyed.

Migrating V2 Plugins to `IPluginV3`#

IPluginV2 and IPluginV2Ext have been deprecated since TensorRT 8.5, and IPluginV2IOExt and IPluginV2DynamicExt are deprecated in TensorRT 10.0. Therefore, new plugins should target IPluginV3, and old ones should be refactored.

Side-by-Side V2 ↔ V3 API Mapping#

The following tables map IPluginV2DynamicExt and IPluginCreator methods to their IPluginV3 / IPluginCreatorV3One equivalents, grouped by lifecycle phase. Use this as the at-a-glance reference when porting a plugin; the conceptual sections above describe the why for each change.

Core / Identity

`IPluginV2*` method	`IPluginV3*` equivalent	Notes
`IPluginV2::getPluginType()`	`IPluginV3OneCore::getPluginName()`	Renamed; same semantics.
`IPluginV2::getPluginVersion()`	`IPluginV3OneCore::getPluginVersion()`	Unchanged.
`IPluginV2::getPluginNamespace()`	`IPluginV3OneCore::getPluginNamespace()`	Unchanged.
N/A	`IPluginV3::getCapabilityInterface(PluginCapabilityType)`	New. Required dispatch entry point for core/build/runtime capabilities.
N/A	`IPluginV3OneBuild::getMetadataString()`	New (optional). Used by engine inspector and logs.

Build phase

`IPluginV2DynamicExt` method	`IPluginV3OneBuild` equivalent	Notes
`getNbOutputs()`	`getNbOutputs()`	Unchanged.
`getOutputDimensions(int, DimsExprs const*, int, IExprBuilder&)`	`getOutputShapes(DimsExprs const, int, DimsExprs const, int, DimsExprs*, int, IExprBuilder&)`	Per-index → one-shot; output via parameter, returns `int32_t` status. Adds shape inputs and supports data-dependent shapes via `IExprBuilder::declareSizeTensor`.
`IPluginV2Ext::getOutputDataType(int, DataType const*, int)`	`getOutputDataTypes(DataType, int, DataType const, int)`	Per-index → one-shot; returns `int32_t` status.
`supportsFormatCombination(int, PluginTensorDesc const*, int, int)`	`supportsFormatCombination(int, DynamicPluginTensorDesc const*, int, int)`	Receives `DynamicPluginTensorDesc` (includes min/opt/max).
`configurePlugin(DynamicPluginTensorDesc const, int, DynamicPluginTensorDesc const, int)`	`configurePlugin(DynamicPluginTensorDesc const, int, DynamicPluginTensorDesc const, int)`	Parameter list unchanged; now returns `int32_t` status instead of `void`.
`getWorkspaceSize(PluginTensorDesc const, int, PluginTensorDesc const, int) const`	`getWorkspaceSize(DynamicPluginTensorDesc const, int, DynamicPluginTensorDesc const, int) const`	Switched to dynamic descriptors.
N/A	`getNbTactics()`, `getValidTactics(int32_t*, int32_t)`, `getFormatCombinationLimit()`	New (optional). Enable custom tactic profiling.
N/A	`getTimingCacheID(...)`	New (optional). Enables timing-cache reuse for the plugin.
N/A	`IPluginV3OneBuildV2::getAliasedInput(int)`	New (optional). Requires `PreviewFeature::kALIASED_PLUGIN_IO_10_03`.

Runtime phase

`IPluginV2DynamicExt` method	`IPluginV3OneRuntime` equivalent	Notes
`enqueue(PluginTensorDesc const, PluginTensorDesc const, void const* const, void const, void, cudaStream_t)`	`enqueue(PluginTensorDesc const, PluginTensorDesc const, void const* const, void const, void, cudaStream_t)`	Signature unchanged.
N/A	`onShapeChange(PluginTensorDesc const, int, PluginTensorDesc const, int)`	New. Called when concrete shapes change between `enqueue` invocations.
N/A	`setTactic(int32_t)`	New. Communicates the chosen tactic before `enqueue`.
`IPluginV2Ext::attachToContext(cudnnContext, cublasContext, IGpuAllocator*)`	`attachToContext(IPluginResourceContext*)`	Now a clone-and-attach operation that returns a new `IPluginV3*`. cuDNN/cuBLAS handles are no longer provided.
`IPluginV2Ext::detachFromContext()`	Removed	Move teardown logic to the destructor.

Serialization and lifetime

`IPluginV2*` method	`IPluginV3*` equivalent	Notes
`IPluginV2::getSerializationSize() const` + `IPluginV2::serialize(void*) const`	`IPluginV3OneRuntime::getFieldsToSerialize()`	Raw byte buffer → structured `PluginFieldCollection`.
`IPluginCreator::deserializePlugin(char const, void const, size_t)`	`IPluginCreatorV3One::createPlugin(char const, PluginFieldCollection const, TensorRTPhase)`	Unified create/deserialize; `phase == kRUNTIME` indicates deserialization.
`IPluginV2::clone() const`	`IPluginV3::clone()`	Non-const; returns `IPluginV3*`.
`IPluginV2::initialize()`	Removed	Plugin must be constructed in an initialized state. Defer lazy init to `configurePlugin` or `onShapeChange`.
`IPluginV2::terminate()`, `IPluginV2::destroy()`	Removed	Move teardown logic to the destructor.

Network attachment

10.x	11.x	Notes
`INetworkDefinition::addPluginV2(ITensor* const*, int, IPluginV2&)`	`INetworkDefinition::addPluginV3(ITensor* const, int, ITensor const*, int, IPluginV3&)`	Adds a separate shape-inputs argument list.
`IPluginV2Layer`	`IPluginV3Layer`	1:1 layer-class replacement.

Known Migration Issues#

Performance: Resolving V2 → V3 Regressions#

A plugin that was performance-tuned against IPluginV2DynamicExt may regress when first ported to IPluginV3 because the V3 lifecycle exposes new opportunities (and a few new costs) that the V2 path did not. The following checklist resolves the most common regressions.

Hoist allocations out of ``enqueue``. IPluginV2 had explicit initialize() / terminate() hooks that authors often used as a one-time setup site. IPluginV3 removes these, so any setup that previously lived in initialize() should move to the constructor, configurePlugin, or onShapeChange, not into enqueue. Per-call allocations are a frequent source of measured regressions.
Advertise tactics where multiple kernels exist. Implement getNbTactics and getValidTactics so the builder can profile each kernel variant your plugin ships and pick the fastest. V2 had no equivalent and forced the plugin to choose at build time.
Enable timing-cache reuse. Implement getTimingCacheID so repeated builds of the same network reuse cached timings for the plugin. Without this, every build re-times every tactic.
Use ``getWorkspaceSize`` instead of internal allocations. Request scratch space through getWorkspaceSize so TensorRT pools the allocation across the network. Internal cudaMalloc in enqueue defeats this pooling.
Build with strongly-typed networks. Strong typing avoids autotuner fallback paths and exposes more fusion opportunities to the V3 plugin’s neighbors. See Known Migration Issues above.
Avoid device allocations in ``clone``. clone is called frequently during the build phase; defer device-side allocations to configurePlugin and release them in the destructor. Refer to Coding Guidelines for Plugins below.
Profile build vs. runtime separately. Use trtexec --verbose --profilingVerbosity=detailed to confirm whether the regression is in the build phase (extra autotuning) or the inference phase (extra per-call work). They have different remediations.

Coding Guidelines for Plugins#

Memory Allocation

Memory allocated in the plugin must be freed to ensure no memory leak. If resources are acquired in the plugin constructor or at a later stage, like onShapeChange, they must be released, possibly in the plugin class destructor.

Another option is to request any additional workspace memory required through getWorkspaceSize, which will be available during enqueue.

Add Checks to Ensure Proper Configuration and Validate Inputs

A common source for unexpected plugin behavior is improper configuration (such as invalid plugin attributes) and invalid inputs. As such, it is good practice to add checks/assertions during the initial plugin development for cases where the plugin is not expected to work. The following are places where checks could be added:

createPlugin: Plugin attributes checks
configurePlugin or onShapeChange: Input dimension checks
enqueue: Input value checks

Return Null at Errors for Methods That Create a New Plugin Object

Methods like createPlugin, clone, and attachToContext can be expected to create and return new plugin objects. In these methods, ensure a null object (nullptr in C++) is returned in case of any error or failed check. This ensures that non-null plugin objects are not returned when configured incorrectly.

Avoid Device Memory Allocations in clone()

Since the builder calls clone multiple times, device memory allocations could be significantly expensive. One option is to do persistent memory allocations in the constructor, copy to a device when the plugin is ready (such as in configurePlugin), and release during destruction.

Serializing Arbitrary Pieces of Data and Custom Types

Plugin authors can utilize PluginField of PluginFieldType::kUNKNOWN to indicate arbitrary pieces of data to be serialized. In this case, the length of the respective PluginField should be the number of bytes corresponding to the buffer pointed to by data. The serialization of non-primitive types can be achieved in this way.

Plugin Shared Libraries#

TensorRT contains built-in plugins that can be loaded statically into your application.

You can explicitly register custom plugins with TensorRT using the REGISTER_TENSORRT_PLUGIN and registerCreator interfaces (refer to Adding Custom Layers). However, you may want TensorRT to manage the registration of a plugin library and, in particular, serialize plugin libraries with the plan file so they are automatically loaded when the engine is created. This can be especially useful when you want to include the plugins in a version-compatible engine so that you do not need to manage them after building the engine. To take advantage of this, you can build shared libraries with specific entry points recognized by TensorRT.

Generating Plugin Shared Libraries#

To create a shared library for plugins, the library must have the following public symbols defined:

extern "C" void setLoggerFinder(ILoggerFinder* finder);
extern "C" IPluginCreator* const* getCreators(int32_t& nbCreators) const;

extern "C" above is only used to prevent name mangling, and the methods should be implemented in C++. Consult your compiler’s ABI documentation for more details.

setLoggerFinder() should set a global pointer of ILoggerFinder in the library for logging in the plugin code. getPluginCreators() returns a list of plugin creators your library contains. An example of these entry points can be found in plugin/common/vfcCommon.h/cpp.

To serialize your plugin libraries with your engine plan, provide the plugin libraries paths to TensorRT using setPluginsToSerialize() in BuilderConfig.

You can also package plugins in the plan when building version-compatible engines. The packaged plugins will have the same lifetime as the engine and will be automatically registered/deregistered when running the engine.

Using Plugin Shared Libraries#

After building your shared libraries, you can configure the builder to serialize them with the engine. Next time you load the engine into TensorRT, the serialized plugin libraries will be loaded and registered automatically.

Note

IPluginRegistry loadLibrary() (C++, Python) functionality now supports plugin-shared libraries containing both V2 and V3 plugin creators through the getCreators() entry point. The getPluginCreators() entry point is valid, too, but is deprecated. TensorRT first checks if the getCreators() symbol is available, and if not, checks for getPluginCreators() as a fallback for backward compatibility. You can then query this to enumerate each plugin creator and register it manually using IPluginRegistry registerCreator() (C++, Python).

Load the plugins for use with the builder before building the engine:

C++

for (size_t i = 0; i < nbPluginLibs; ++i)
{
    builder->getPluginRegistry().loadLibrary(pluginLibs[i]);
}

Python

for plugin_lib in plugin_libs:
    builder.get_plugin_registry().load_library(plugin_lib)

Next, decide if the plugins should be included with the engine or shipped externally. You can serialize the plugins with the plan as follows:

C++

IBuilderConfig *config = builder->createBuilderConfig();
...
config->setPluginsToSerialize(pluginLibs, nbPluginLibs);

Python

config = builder.create_builder_config()
...
config.plugins_to_serialize = plugin_libs

Alternatively, you can keep the plugins external to the engine. You will need to ship these libraries along with the engine when it is deployed and load them explicitly in the runtime before deserializing the engine:

C++

// In this example, getExternalPluginLibs() is a user-implemented method that retrieves the list of libraries to use with the engine
std::vector<std::string> pluginLibs = getExternalPluginLibs();
for (auto const &pluginLib : pluginLibs)
{
    runtime->getPluginRegistry().loadLibrary(pluginLib.c_str())
}

Python

# In this example, get_external_plugin_libs() is a user-implemented method that retrieves the list of libraries to use with the engine
plugin_libs = get_external_plugin_libs()
for plugin_lib in plugin_libs:
    runtime.get_plugin_registry().load_library(plugin_lib)

Plugin API Description#

IPluginV3 API Description#

IPluginCreatorV3One API Description#

Migrating V2 Plugins to IPluginV3#

Side-by-Side V2 ↔ V3 API Mapping#

Known Migration Issues#

Performance: Resolving V2 → V3 Regressions#

Coding Guidelines for Plugins#

Plugin Shared Libraries#

Generating Plugin Shared Libraries#

Using Plugin Shared Libraries#

`IPluginV3` API Description#

`IPluginCreatorV3One` API Description#

Migrating V2 Plugins to `IPluginV3`#