Plugin API Description#

All new plugins should derive from both IPluginCreatorV3One and IPluginV3 classes. In addition, new plugins should also be registered in the plugin registry, either dynamically by using IPluginRegistry::registerCreator() or statically using the REGISTER_TENSORRT_PLUGIN(...) macro. Custom plugin libraries can also consider implementing an init function equivalent to initLibNvInferPlugins() to perform bulk registration.

Note

Automotive safety users must use the REGISTER_SAFE_TENSORRT_PLUGIN(...) macro instead of REGISTER_TENSORRT_PLUGIN(...). Refer to the NVIDIA TensorRT Safety Production Guide for DriveOS for any safety-related activities.

IPluginV3 API Description#

The following section describes the functions of IPluginV3 and, by extension, IPluginV3OneCore, IPluginV3OneBuild or IPluginV3OneBuildV2, and IPluginV3OneRuntime.

Since an IPluginV3 object consists of different capabilities, IPluginV3::getCapabilityInterface can be called anytime during its lifetime. An IPluginV3 object added for the build phase must return a valid capability interface for all capability types: core, build, and runtime. The build capability can be omitted for objects added for the runtime phase.

There are a few methods used to request identifying information about the plugin. They can also be called during any stage of the plugin’s lifetime.

  • IPluginV3OneCore::getPluginName: Used to query for the plugin’s name

  • IPluginV3OneCore::getPluginVersion: Used to query for the plugin’s version

  • IPluginV3OneCore::getPluginNamespace: Used to query for the plugin’s namespace

  • IPluginV3OneBuild::getMetadataString: Used to query for a string representation of any metadata associated with the plugin, such as the values of its attributes.

To connect a plugin layer to neighboring layers and set up input and output data structures, the builder checks for the number of outputs and their shapes by calling the following plugin methods:

  • IPluginV3OneBuild::getNbOutputs: Used to specify the number of output tensors.

  • IPluginV3OneBuild::getOutputShapes: This function specifies the output shapes as a function of the input shapes or constants. The exception is data-dependent shapes with a specified upper bound and optimal tuning value.

  • IPluginV3OneBuild::supportsFormatCombination: Used to check if a plugin supports a given data type and format combination.

  • IPluginV3OneBuild::getOutputDataType: This function retrieves the data types of the output tensors. The returned data types must be in a format supported by the plugin.

If the IPluginV3OneBuildV2 build capability is used, the plugin can also communicate to TensorRT that certain input-output pairs are aliased (share the same data buffer). TensorRT will query IPluginV3OneBuildV2::getAliasedInput to determine any such aliasing behavior. To use this feature, PreviewFeature::kALIASED_PLUGIN_IO_10_03 must be enabled.

Plugin layers can support the following data formats:

  • LINEAR single-precision (FP32), half-precision (FP16), brain floating-point (BF16), 8-bit floating-point E4M3 (FP8), integer (INT8), and integer (INT32) tensors

  • CHW32 single-precision (FP32) and integer (INT8) tensors.

  • CHW2, HWC8, HWC16, and DHWC8 half-precision (FP16) tensors.

  • CHW4 half-precision (FP16), and integer (INT8) tensors.

  • HWC8, HWC4, NDHWC8, NC2HW brain floating-point (BF16) tensors.

PluginFormat counts the formats.

Plugins that do not compute all data in place and need memory space in addition to input and output tensors can specify the additional memory requirements with the IPluginV3OneBuild::getWorkspaceSize method, which the builder calls to determine and preallocate scratch space.

The layer is configured, executed, and destroyed at build time to discover optimal configurations. After selecting the optimal configuration for a plugin, the chosen tactic and concrete shape/format information (except for data-dependent dimensions) are communicated to the plugin during inference. It is executed as many times as needed for the lifetime of the inference application and finally destroyed when the engine is destroyed.

The builder controls these steps and runtime using the following plugin methods. Methods also called during inference are indicated by (*) - all others are only called by the builder.

  • IPluginV3OneBuild::attachToContext*: This function requests that a plugin clone be attached to an ExecutionContext, allowing the plugin to access any context-specific resources.

  • IPluginV3OneBuild::getTimingCacheId: This function queries for any timing cached ID that TensorRT can use. If provided, it enables timing caching (it is disabled by default).

  • IPluginV3OneBuild::getNbTactics: Used to query for the number of custom tactics the plugin chooses to use.

  • IPluginV3OneBuild::getValidTactics: This function queries for any custom tactics the plugin can use. The plugin will be profiled for each tactic up to a maximum indicated by IPluginV3OneBuild::getFormatCombinationLimit().

  • IPluginV3OneBuild::getFormatCombinationLimit: This function queries the maximum number of format combinations that can be timed for each tactic (0 if no custom tactics are advertised for the default tactic).

  • IPluginV3OneRuntime::setTactic*: Communicates the tactic to be used during the subsequent enqueue(). If no custom tactics were advertised, this would always be 0.

  • IPluginV3OneBuild::configurePlugin: Communicates the number of inputs and outputs and their shapes, data types, and formats. The min, opt, and max of each input or output’s DynamicPluginTensorDesc correspond to the kMIN, kOPT, and kMAX values of the optimization profile that the plugin is currently profiled for. The desc.dims field corresponds to the dimensions of plugin inputs specified at network creation. Wildcard dimensions can exist during this phase in the desc.dims field. At this point, the plugin can set up its internal state and select the most appropriate algorithm and data structures for the given configuration.

  • IPluginV3OneRuntime::onShapeChange*: Communicates the number of inputs and outputs and their shapes, data types, and formats. The dimensions are concrete, except if data-dependent dimensions exist, which wildcards will indicate.

  • IPluginV3OneRuntime::enqueue*: Encapsulates the actual algorithm and kernel calls of the plugin and provides pointers to input, output, and scratch space, as well as the CUDA stream to be used for kernel execution.

  • IPluginV3::clone: This is called every time a new builder, network, or engine is created that includes this plugin layer. It must return a new plugin object with the correct parameters.

After the builder completes profiling, before the engine is serialized, IPluginV3OneRuntime::getFieldsToSerialize is called to query for any plugin fields that must be serialized into the engine. These are expected to be data that the plugin needs to function properly during the inference stage after the engine has been deserialized.

IPluginCreatorV3One API Description#

The following methods in the IPluginCreatorV3One class are used to find and create the appropriate plugin from the plugin registry:

  • getPluginName: This returns the plugin name and should match the return value of IPluginV3OneCore::getPluginName.

  • getPluginVersion: Returns the plugin version. For all internal TensorRT plugins, this defaults to 1.

  • getPluginNamespace: Returns the plugin namespace. The default can be "".

  • getFieldNames: To successfully create a plugin, you must know all the plugin’s field parameters. This method returns the PluginFieldCollection struct with the PluginField entries populated to reflect the field name and PluginFieldType (the data should point to nullptr).

  • createPlugin: This method creates a plugin, passing a PluginFieldCollection and a TensorRTPhase argument.

During engine deserialization, TensorRT calls this method with the TensorRTPhase argument set to TensorRTPhase::kRUNTIME and the PluginFieldCollection populated with the same PluginFields as in the one returned by IPluginV3OneRuntime::getFieldsToSerialize(). In this case, TensorRT takes ownership of plugin objects returned by createPlugin.

You can also invoke createPlugin to produce plugin objects to add to a TensorRT network. In this case, setting the phase argument to TensorRTPhase::kBUILD is recommended. The data passed with the PluginFieldCollection should be allocated and freed by the caller before the program is destroyed. The ownership of the plugin object returned by the createPlugin function is passed to the caller and must be destroyed.

Migrating V2 Plugins to IPluginV3#

IPluginV2 and IPluginV2Ext have been deprecated since TensorRT 8.5, and IPluginV2IOExt and IPluginV2DynamicExt are deprecated in TensorRT 10.0. Therefore, new plugins should target IPluginV3, and old ones should be refactored.

Key migration points from IPluginV2DynamicExt to IPluginV3

Keep in mind the following key points when migrating an IPluginV2DynamicExt plugin to IPluginV3:

  • The plugin creator associated with the plugin must be migrated to IPluginCreatorV3One, the factory class for IPluginV3 (IPluginCreator is the factory class for IPluginV2 derivatives). This simply consists of migrating IPluginCreator::deserializePlugin. For more information, refer to the Plugin Serialization and Deserialization section.

  • There is no equivalent to IPluginV2::initialize(), IPluginV2::terminate(), and IPluginV2::destroy() in IPluginV3. For more information, refer to the Plugin Initialization and Termination section.

  • There is no equivalent to IPluginV2Ext::detachFromContext() in IPluginV3. For more information, refer to the Accessing Context-Specific Resources Provided by TensorRT section.

  • IPluginV3OneRuntime::attachToContext() is markedly different from IPluginV2Ext::attachToContext() regarding arguments and behavior. For more information, refer to the Accessing Context-Specific Resources Provided by TensorRT section.

  • In IPluginV3, plugin serialization is through a PluginFieldCollection that gets passed to TensorRT by IPluginV3OneRuntime::getFieldsToSerialize() and deserialization is through the same PluginFieldCollection that gets passed back by TensorRT to IPluginCreatorV3One::createPlugin(...). For more information, refer to the Plugin Serialization and Deserialization section.

  • The IPluginV3 equivalents of void return methods in IPluginV2DynamicExt will expect an integer status code as a return value (such as configurePlugin).

  • supportsFormatCombination and getWorkspaceSize get dynamic tensor descriptors (DynamicPluginTensorDesc) instead of static descriptors (PluginTensorDesc).

  • IPluginV2DynamicExt::getOutputDimensions() becomes IPluginV3OneBuild::getOutputShapes() and changes to an output parameter signature instead of a return value. It also shifts from per-output index querying to one-shot querying. A similar transition applies from IPluginV2Ext::getOutputDataType to IPluginV3OneBuild::getOutputDataTypes.

Plugin Initialization and Termination

IPluginV2 provided several APIs for plugin initialization and termination: namely, IPluginV2::initialize(), IPluginV2::terminate(), and IPluginV2::destroy(). In IPluginV3, plugins are expected to be constructed in an initialized state; if your V2 plugin had any lazy initialization in initialize, it can be deferred to onShapeChange or configurePlugin. Any resource release or termination logic in IPluginV2::terminate() or IPluginV2::destroy() can be moved to the class destructor. The exception is in the Python API; IPluginV3.destroy() is provided as an alternative for a C++-like destructor.

Accessing Context-Specific Resources Provided by TensorRT

IPluginV2Ext::attachToContext() provided plugins access to context-specific resources, namely the GPU allocator and cuDNN and cuBLAS handles. IPluginV3OneRuntime::attachToContext() is meant to provide a similar service to plugins, but it instead provides an IPluginResourceContext, which in turn exposes resources that plugins can request.

In a departure from IPluginV2Ext::attachToContext(), cuDNN and cuBLAS handles are no longer provided by IPluginResourceContext; any plugins that depended on those should migrate to initialize their own cuDNN and cuBLAS resources. If sharing cuDNN/cuBLAS resources among plugins is preferred, you can utilize the functionality provided by IPluginResource and the plugin registry’s key-value store to accomplish this. For more information, refer to the Sharing Custom Resources Among Plugins section.

IPluginV3OneRuntime::attachToContext(...) is a clone-and-attach operation. It is asked to clone the entire IPluginV3 object, not just the runtime capability. Therefore, if implemented as a separate class, the runtime capability object can need to hold a reference to the IPluginV3 object of which it is a part.

Any context-specific resource obtained through IPluginResourceContext can be used until the plugin is destroyed. Therefore, any termination logic implemented in IPluginV2Ext::detachFromContext() can be moved to the plugin destructor.

Plugin Serialization and Deserialization

For V2 plugins, serialization and deserialization were determined by the implementation of IPluginV2::serialize, IPluginV2::getSerializationSize, and IPluginCreator::deserializePlugin; IPluginV3OneRuntime::getFieldsToSerialize and IPluginCreatorV3One::createPlugin have replaced these. Note that the workflow has shifted from writing to/reading from a raw buffer to constructing and parsing a PluginFieldCollection.

TensorRT handles the serialization of types defined in PluginFieldType. Custom types can be serialized as PluginFieldType::kUNKNOWN. For example:

struct DummyStruct
{
    int32_t a;
    float b;
};

DummyPlugin()
{
    // std::vector<nvinfer1::PluginField> mDataToSerialize;
    // int32_t mIntValue;
    // std::vector<float> mFloatVector;
    // DummyStruct mDummyStruct;
    mDataToSerialize.clear();
    mDataToSerialize.emplace_back(PluginField("intScalar", &mIntValue, PluginFieldType::kINT32, 1));
    mDataToSerialize.emplace_back(PluginField("floatVector", mFloatVector.data(), PluginFieldType::kFLOAT32, mFloatVector.size()));
    mDataToSerialize.emplace_back(PluginField("dummyStruct", &mDummyStruct, PluginFieldType::kUNKNOWN, sizeof(DummyStruct)));
    mFCToSerialize.nbFields = mDataToSerialize.size();
    mFCToSerialize.fields = mDataToSerialize.data();
}

nvinfer1::PluginFieldCollection const* DummyPlugin::getFieldsToSerialize() noexcept override
{
    return &mFCToSerialize;
}
Migrating Older V2 Plugins to IPluginV3

If migrating from IPluginV2 or IPluginV2Ext to IPluginV3, it is easier to migrate first to IPluginV2DynamicExt and then follow the guidelines above to migrate to IPluginV3. The new features in IPluginV2DynamicExt are as follows:

virtual DimsExprs getOutputDimensions(int outputIndex, const DimsExprs* inputs, int nbInputs, IExprBuilder& exprBuilder) = 0;

virtual bool supportsFormatCombination(int pos, const PluginTensorDesc* inOut, int nbInputs, int nbOutputs) = 0;

virtual void configurePlugin(const DynamicPluginTensorDesc* in, int nbInputs, const DynamicPluginTensorDesc* out, int nbOutputs) = 0;

virtual size_t getWorkspaceSize(const PluginTensorDesc* inputs, int nbInputs, const PluginTensorDesc* outputs, int nbOutputs) const = 0;

virtual int enqueue(const PluginTensorDesc* inputDesc, const PluginTensorDesc* outputDesc, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) = 0;

Guidelines for migration to IPluginV2DynamicExt are:

  • getOutputDimensions implements the expression for output tensor dimensions given the inputs.

  • supportsFormatCombination checks if the plugin supports the format and datatype for the specified I/O.

  • configurePlugin mimics the behavior of equivalent configurePlugin in IPluginV2Ext but accepts tensor descriptors.

  • getWorkspaceSize and enqueue mimic the behavior of equivalent APIs in IPluginV2Ext but accept tensor descriptors.

Side-by-Side V2 ↔ V3 API Mapping#

The following tables map IPluginV2DynamicExt and IPluginCreator methods to their IPluginV3 / IPluginCreatorV3One equivalents, grouped by lifecycle phase. Use this as the at-a-glance reference when porting a plugin; the conceptual sections above describe the why for each change.

Core / Identity

IPluginV2* method

IPluginV3* equivalent

Notes

IPluginV2::getPluginType()

IPluginV3OneCore::getPluginName()

Renamed; same semantics.

IPluginV2::getPluginVersion()

IPluginV3OneCore::getPluginVersion()

Unchanged.

IPluginV2::getPluginNamespace()

IPluginV3OneCore::getPluginNamespace()

Unchanged.

N/A

IPluginV3::getCapabilityInterface(PluginCapabilityType)

New. Required dispatch entry point for core/build/runtime capabilities.

N/A

IPluginV3OneBuild::getMetadataString()

New (optional). Used by engine inspector and logs.

Build phase

IPluginV2DynamicExt method

IPluginV3OneBuild equivalent

Notes

getNbOutputs()

getNbOutputs()

Unchanged.

getOutputDimensions(int, DimsExprs const*, int, IExprBuilder&)

getOutputShapes(DimsExprs const*, int, DimsExprs const*, int, DimsExprs*, int, IExprBuilder&)

Per-index → one-shot; output via parameter, returns int32_t status. Adds shape inputs and supports data-dependent shapes via IExprBuilder::declareSizeTensor.

IPluginV2Ext::getOutputDataType(int, DataType const*, int)

getOutputDataTypes(DataType*, int, DataType const*, int)

Per-index → one-shot; returns int32_t status.

supportsFormatCombination(int, PluginTensorDesc const*, int, int)

supportsFormatCombination(int, DynamicPluginTensorDesc const*, int, int)

Receives DynamicPluginTensorDesc (includes min/opt/max).

configurePlugin(DynamicPluginTensorDesc const*, int, DynamicPluginTensorDesc const*, int)

configurePlugin(DynamicPluginTensorDesc const*, int, DynamicPluginTensorDesc const*, int)

Parameter list unchanged; now returns int32_t status instead of void.

getWorkspaceSize(PluginTensorDesc const*, int, PluginTensorDesc const*, int) const

getWorkspaceSize(DynamicPluginTensorDesc const*, int, DynamicPluginTensorDesc const*, int) const

Switched to dynamic descriptors.

N/A

getNbTactics(), getValidTactics(int32_t*, int32_t), getFormatCombinationLimit()

New (optional). Enable custom tactic profiling.

N/A

getTimingCacheID(...)

New (optional). Enables timing-cache reuse for the plugin.

N/A

IPluginV3OneBuildV2::getAliasedInput(int)

New (optional). Requires PreviewFeature::kALIASED_PLUGIN_IO_10_03.

Runtime phase

IPluginV2DynamicExt method

IPluginV3OneRuntime equivalent

Notes

enqueue(PluginTensorDesc const*, PluginTensorDesc const*, void const* const*, void* const*, void*, cudaStream_t)

enqueue(PluginTensorDesc const*, PluginTensorDesc const*, void const* const*, void* const*, void*, cudaStream_t)

Signature unchanged.

N/A

onShapeChange(PluginTensorDesc const*, int, PluginTensorDesc const*, int)

New. Called when concrete shapes change between enqueue invocations.

N/A

setTactic(int32_t)

New. Communicates the chosen tactic before enqueue.

IPluginV2Ext::attachToContext(cudnnContext*, cublasContext*, IGpuAllocator*)

attachToContext(IPluginResourceContext*)

Now a clone-and-attach operation that returns a new IPluginV3*. cuDNN/cuBLAS handles are no longer provided.

IPluginV2Ext::detachFromContext()

Removed

Move teardown logic to the destructor.

Serialization and lifetime

IPluginV2* method

IPluginV3* equivalent

Notes

IPluginV2::getSerializationSize() const + IPluginV2::serialize(void*) const

IPluginV3OneRuntime::getFieldsToSerialize()

Raw byte buffer → structured PluginFieldCollection.

IPluginCreator::deserializePlugin(char const*, void const*, size_t)

IPluginCreatorV3One::createPlugin(char const*, PluginFieldCollection const*, TensorRTPhase)

Unified create/deserialize; phase == kRUNTIME indicates deserialization.

IPluginV2::clone() const

IPluginV3::clone()

Non-const; returns IPluginV3*.

IPluginV2::initialize()

Removed

Plugin must be constructed in an initialized state. Defer lazy init to configurePlugin or onShapeChange.

IPluginV2::terminate(), IPluginV2::destroy()

Removed

Move teardown logic to the destructor.

Network attachment

10.x

11.x

Notes

INetworkDefinition::addPluginV2(ITensor* const*, int, IPluginV2&)

INetworkDefinition::addPluginV3(ITensor* const*, int, ITensor* const*, int, IPluginV3&)

Adds a separate shape-inputs argument list.

IPluginV2Layer

IPluginV3Layer

1:1 layer-class replacement.

Known Migration Issues#

Use strongly-typed networks with IPluginV3

Mixing IPluginV3 plugins with weakly-typed networks (those still relying on BuilderFlag::kFP16 / kBF16 style precision selection in 10.x) can hit fusion paths that were not exercised by IPluginV2DynamicExt. The supported and recommended configuration in 11.x is to build with strongly-typed networks. Refer to Migrating from Weak Typing to Strong Typing for the conversion steps.

In TensorRT 11.0 all precision-enabling builder flags have been removed from non-DLA builds, so any plugin migrated as part of a 10.x → 11.x upgrade is automatically running in a strongly-typed network. Authors back-porting V3 plugins to a 10.x build for evaluation should explicitly opt into a strongly-typed network with createNetworkV2(NetworkDefinitionCreationFlag::kSTRONGLY_TYPED).

attachToContext is now clone-and-attach

IPluginV2Ext::attachToContext mutated the existing plugin instance. IPluginV3OneRuntime::attachToContext instead clones the entire IPluginV3 object and returns the new instance. Plugins that store a back-pointer from a runtime-capability sub-object to the owning IPluginV3 must update that back-pointer in the cloned instance, otherwise the cloned runtime capability will dispatch back into the original plugin and produce stale or freed-memory accesses.

PluginFieldCollection lifetime during deserialization

In V2, deserializePlugin received a raw byte buffer that the plugin was free to consume immediately. In V3, IPluginCreatorV3One::createPlugin receives a PluginFieldCollection whose underlying buffers are owned by TensorRT and only valid for the duration of the call. Plugins must copy any data they need to retain (vectors, structs, weight blobs) into plugin-owned storage before returning from createPlugin. Holding pointers into the incoming PluginField::data past the call is a use-after-free.

Aliased I/O migration

Plugins that relied on overlapping input/output buffers in V2 should migrate to IPluginV3OneBuildV2 and implement getAliasedInput to declare the aliasing explicitly, then enable PreviewFeature::kALIASED_PLUGIN_IO_10_03 on the builder config. Refer to the IPluginV3 API Description section for details.

Performance: Resolving V2 → V3 Regressions#

A plugin that was performance-tuned against IPluginV2DynamicExt may regress when first ported to IPluginV3 because the V3 lifecycle exposes new opportunities (and a few new costs) that the V2 path did not. The following checklist resolves the most common regressions.

  1. Hoist allocations out of ``enqueue``. IPluginV2 had explicit initialize() / terminate() hooks that authors often used as a one-time setup site. IPluginV3 removes these, so any setup that previously lived in initialize() should move to the constructor, configurePlugin, or onShapeChange, not into enqueue. Per-call allocations are a frequent source of measured regressions.

  2. Advertise tactics where multiple kernels exist. Implement getNbTactics and getValidTactics so the builder can profile each kernel variant your plugin ships and pick the fastest. V2 had no equivalent and forced the plugin to choose at build time.

  3. Enable timing-cache reuse. Implement getTimingCacheID so repeated builds of the same network reuse cached timings for the plugin. Without this, every build re-times every tactic.

  4. Use ``getWorkspaceSize`` instead of internal allocations. Request scratch space through getWorkspaceSize so TensorRT pools the allocation across the network. Internal cudaMalloc in enqueue defeats this pooling.

  5. Build with strongly-typed networks. Strong typing avoids autotuner fallback paths and exposes more fusion opportunities to the V3 plugin’s neighbors. See Known Migration Issues above.

  6. Avoid device allocations in ``clone``. clone is called frequently during the build phase; defer device-side allocations to configurePlugin and release them in the destructor. Refer to Coding Guidelines for Plugins below.

  7. Profile build vs. runtime separately. Use trtexec --verbose --profilingVerbosity=detailed to confirm whether the regression is in the build phase (extra autotuning) or the inference phase (extra per-call work). They have different remediations.

Coding Guidelines for Plugins#

Memory Allocation

Memory allocated in the plugin must be freed to ensure no memory leak. If resources are acquired in the plugin constructor or at a later stage, like onShapeChange, they must be released, possibly in the plugin class destructor.

Another option is to request any additional workspace memory required through getWorkspaceSize, which will be available during enqueue.

Add Checks to Ensure Proper Configuration and Validate Inputs

A common source for unexpected plugin behavior is improper configuration (such as invalid plugin attributes) and invalid inputs. As such, it is good practice to add checks/assertions during the initial plugin development for cases where the plugin is not expected to work. The following are places where checks could be added:

  • createPlugin: Plugin attributes checks

  • configurePlugin or onShapeChange: Input dimension checks

  • enqueue: Input value checks

Return Null at Errors for Methods That Create a New Plugin Object

Methods like createPlugin, clone, and attachToContext can be expected to create and return new plugin objects. In these methods, ensure a null object (nullptr in C++) is returned in case of any error or failed check. This ensures that non-null plugin objects are not returned when configured incorrectly.

Avoid Device Memory Allocations in clone()

Since the builder calls clone multiple times, device memory allocations could be significantly expensive. One option is to do persistent memory allocations in the constructor, copy to a device when the plugin is ready (such as in configurePlugin), and release during destruction.

Serializing Arbitrary Pieces of Data and Custom Types

Plugin authors can utilize PluginField of PluginFieldType::kUNKNOWN to indicate arbitrary pieces of data to be serialized. In this case, the length of the respective PluginField should be the number of bytes corresponding to the buffer pointed to by data. The serialization of non-primitive types can be achieved in this way.

Plugin Shared Libraries#

TensorRT contains built-in plugins that can be loaded statically into your application.

You can explicitly register custom plugins with TensorRT using the REGISTER_TENSORRT_PLUGIN and registerCreator interfaces (refer to Adding Custom Layers). However, you may want TensorRT to manage the registration of a plugin library and, in particular, serialize plugin libraries with the plan file so they are automatically loaded when the engine is created. This can be especially useful when you want to include the plugins in a version-compatible engine so that you do not need to manage them after building the engine. To take advantage of this, you can build shared libraries with specific entry points recognized by TensorRT.

Generating Plugin Shared Libraries#

To create a shared library for plugins, the library must have the following public symbols defined:

extern "C" void setLoggerFinder(ILoggerFinder* finder);
extern "C" IPluginCreator* const* getCreators(int32_t& nbCreators) const;

extern "C" above is only used to prevent name mangling, and the methods should be implemented in C++. Consult your compiler’s ABI documentation for more details.

setLoggerFinder() should set a global pointer of ILoggerFinder in the library for logging in the plugin code. getPluginCreators() returns a list of plugin creators your library contains. An example of these entry points can be found in plugin/common/vfcCommon.h/cpp.

To serialize your plugin libraries with your engine plan, provide the plugin libraries paths to TensorRT using setPluginsToSerialize() in BuilderConfig.

You can also package plugins in the plan when building version-compatible engines. The packaged plugins will have the same lifetime as the engine and will be automatically registered/deregistered when running the engine.

Using Plugin Shared Libraries#

After building your shared libraries, you can configure the builder to serialize them with the engine. Next time you load the engine into TensorRT, the serialized plugin libraries will be loaded and registered automatically.

Note

IPluginRegistry loadLibrary() (C++, Python) functionality now supports plugin-shared libraries containing both V2 and V3 plugin creators through the getCreators() entry point. The getPluginCreators() entry point is valid, too, but is deprecated. TensorRT first checks if the getCreators() symbol is available, and if not, checks for getPluginCreators() as a fallback for backward compatibility. You can then query this to enumerate each plugin creator and register it manually using IPluginRegistry registerCreator() (C++, Python).

Load the plugins for use with the builder before building the engine:

1for (size_t i = 0; i < nbPluginLibs; ++i)
2{
3    builder->getPluginRegistry().loadLibrary(pluginLibs[i]);
4}
1for plugin_lib in plugin_libs:
2    builder.get_plugin_registry().load_library(plugin_lib)

Next, decide if the plugins should be included with the engine or shipped externally. You can serialize the plugins with the plan as follows:

1IBuilderConfig *config = builder->createBuilderConfig();
2...
3config->setPluginsToSerialize(pluginLibs, nbPluginLibs);
1config = builder.create_builder_config()
2...
3config.plugins_to_serialize = plugin_libs

Alternatively, you can keep the plugins external to the engine. You will need to ship these libraries along with the engine when it is deployed and load them explicitly in the runtime before deserializing the engine:

1// In this example, getExternalPluginLibs() is a user-implemented method that retrieves the list of libraries to use with the engine
2std::vector<std::string> pluginLibs = getExternalPluginLibs();
3for (auto const &pluginLib : pluginLibs)
4{
5    runtime->getPluginRegistry().loadLibrary(pluginLib.c_str())
6}
1# In this example, get_external_plugin_libs() is a user-implemented method that retrieves the list of libraries to use with the engine
2plugin_libs = get_external_plugin_libs()
3for plugin_lib in plugin_libs:
4    runtime.get_plugin_registry().load_library(plugin_lib)