Adding Custom Layers using the Python API (TensorRT >= 10.6)#
For most use cases, defining Python plugins with a decorator-based approach is recommended (available starting in TensorRT 10.6). Refer to the Writing Custom Operators with TensorRT Python Plugins in the TensorRT Python API documentation for a manual describing different use cases and best practices. Note that embedding Python-defined plugins to TensorRT engines such that the engine is independent of Python and the plugin source itself, is only possible with this approach.
Adding Custom Layers using the Python API (Advanced/TensorRT <= 10.5)#
Python with a class-based approach (it is also the only supported approach for TensorRT <= 10.5). In contrast to decorator-based Python plugins (described in the preceding section), class-based plugins offer the following:
Statefulness: class-based plugins have stated (such as configured/non-configured) and more granular querying by TensorRT for different plugin properties and behaviors.
Shape tensor input support.
Fine-grained control of the plugin instances TensorRT creates during engine deserialization is only possible with custom plugin creator definitions, which are only available with a class-based approach.
Manual serialization and deserialization of plugin attributes.
Ability to pre-request a device memory scratch space (workspace in addition to input/output buffers) to avoid execution-time device memory allocations.
These often come at the expense of increased implementation complexity and code bloat, which can lead to more bugs. Therefore, a tradeoff analysis is recommended before considering class-based plugin implementations in Python.
Implementing a class-based plugin in Python is similar to C++ in that implementation of IPluginV3 and IPluginCreatorV3One is necessary. Interface methods in Python have mostly similar APIs to their C++ counterparts, and most differences are minor and self-explanatory.
The following list includes a few selected changes. Subsequent subsections describe the differences involved in more detail.
The following plugin APIs have been omitted in favor of reading/writing to an appropriately named attribute.
Class |
Method |
Replaced with Attribute |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Some methods have default implementations; these can be left unimplemented, and the default behaviors outlined below will take effect:
class trt.IPluginV3: def destroy(self): pass class trt.IPluginV3OneBuild: def get_valid_tactics(self): return [] def get_workspace_size(self, input_desc, output_desc): return 0
Methods that must return integer status codes in
IPluginV3OneBuildandIPluginV3OneRuntimeshould raise exceptions in Python instead. For example:1int32_t configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs, DynamicPluginTensorDesc const* out, int32_t nbOutputs)
1configure_plugin(self: trt.IPluginV3OneBuild, in: List[trt.DynamicPluginTensorDesc], out: List[trt.DynamicPluginTensorDesc]) -> None
For example, you can raise a
ValueErrorduringenqueueif an input has an illegal value.The Python API
IPluginV3.destroy()has no direct equivalent in the C++ API. Python plugins are expected to perform any functionality that would be performed in anIPluginV3C++ destructor within theIPluginV3.destroy()method.For full examples demonstrating Python plugins, refer to the python_plugin sample.
Registration of a Python Plugin#
Python plugins must be registered dynamically through the IPluginRegistry.register_creator() API. There is no analog to the REGISTER_TENSORRT_PLUGIN available for static registration.
Building and Running TensorRT Engines Containing Python Plugins#
It is possible to build TensorRT engines using Python-based plugins. However, running such engines outside of Python is currently impossible since the plugin must be available in the scope where the engine is being deserialized. For example, you cannot use a tool like trtexec directly.
Implementing enqueue of a Python Plugin#
The API for IPluginV3OneRuntime::enqueue() in C++ and Python are as follows:
1int32_t enqueue(PluginTensorDesc const *inputDesc, PluginTensorDesc const *outputDesc, void const *const *inputs, void *const *outputs, void *workspace, cudaStream_t stream)
1enqueue(self: trt.IPluginV3OneRuntime, input_desc: List[trt.PluginTensorDesc], output_desc: List[trt.PluginTensorDesc], inputs: List[int], outputs: List[int], workspace: int, stream: int) -> None
Here, inputs, outputs, and workspace are passed in as intptr_t casts of the respective device pointers. Similarly, a stream is an intptr_t cast of a pointer to the CUDA stream handle. There is flexibility within Python on how to read from and write to these buffers, and this can be achieved depending on the particular use case. For example, with CUDA Python, this is quite simple since cuda.cuLaunchKernel accepts int representing the pointers wrapped in NumPy arrays:
d_input = np.array([inputs[0]], dtype=np.uint64)
d_output = np.array([outputs[0]], dtype=np.uint64)
stream_ptr = np.array([stream], dtype=np.uint64)
args = [d_input, d_output]
kernel_args = np.array([arg.ctypes.data for arg in args], dtype=np.uint64)
...
checkCudaErrors(cuda.cuLaunchKernel(_float_kernel,
num_blocks, 1, 1,
block_size, 1, 1,
0,
stream_ptr,
kernel_args, 0)
Translating Device Buffers/CUDA Stream Pointers in enqueue to other Frameworks#
Constructing CuPy arrays on top of device buffers is possible using CuPy’s UnownedMemory class.
def enqueue(self, input_desc, output_desc, inputs, outputs, workspace, stream):
...
inp_dtype = trt.nptype(input_desc[0].type)
inp_mem = cp.cuda.UnownedMemory(
inputs[0], volume(input_desc[0].dims) * cp.dtype(inp_dtype).itemsize, self
)
out_mem = cp.cuda.UnownedMemory(
outputs[0],
volume(output_desc[0].dims) * cp.dtype(inp_dtype).itemsize,
self,
)
inp_ptr = cp.cuda.MemoryPointer(inp_mem, 0)
out_ptr = cp.cuda.MemoryPointer(out_mem, 0)
inp = cp.ndarray((volume(input_desc[0].dims)), dtype=inp_dtype, memptr=inp_ptr)
out = cp.ndarray((volume(output_desc[0].dims)), dtype=inp_dtype, memptr=out_ptr)
If needed, torch.as_tensor() can then be used to construct a Torch array:
# inp_d = cp.ndarray(tuple(input_desc[0].dims), dtype=inp_dtype, memptr=inp_ptr)
inp_t = torch.as_tensor(inp_d, device='cuda')
Similarly, CuPy stream handles can be constructed from the passed-in stream pointer through CuPy’s ExternalStream class.
cuda_stream = cp.cuda.ExternalStream(stream)
Automatic Downcasting#
TensorRT Python bindings will do automatic downcasting for custom types written in Python implementing interfaces like IPluginCreatorV3One or IPluginResource. For example, take the following method from IPluginRegistry as an example:
get_creator(self: trt.IPluginRegistry, name: string, version: string,
namespace: string = "") -> trt.IPluginCreatorInterface
The return type is indicated as IPluginCreatorInterface. However, in practice, if you were to write a class MyPluginCreator implementing IPluginCreatorV3One (which in turn implements IPluginCreatorInterface), the get_creator method will return an automatically downcasted type of MyPluginCreator.
This extends to trt.IPluginRegistry.all_creators, which is a List[trt.IPluginCreatorInterface]. If you had registered a plugin creator of type MyPluginCreator and another type MyOtherPluginCreator, both plugin creators will be present as those respective types in the list.
Example: Adding a Custom Layer to a TensorRT Network Using Python#
Using plugin nodes, custom layers can be added to any TensorRT network in Python. The Python API has a function called add_plugin_v3 that enables adding a plugin node to a network. The following example illustrates this. It creates a simple TensorRT network and adds a hypothetical plugin node by looking up the TensorRT plugin registry.
import tensorrt as trt
import numpy as np
TRT_LOGGER = trt.Logger()
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
def get_trt_plugin(plugin_name, plugin_version, plugin_namespace):
plugin = None
plugin_creator = trt.get_plugin_registry().get_creator(plugin_name, plugin_version, plugin_namespace)
# trt will automatically downcast to IPluginCreator or IPluginCreatorInterface
# Can inspect plugin_creator.interface_info to make sure
if plugin_creator is not None:
lrelu_slope_field = trt.PluginField("epsilon", np.array([0.00000001], dtype=np.float32), trt.PluginFieldType.FLOAT32)
field_collection = trt.PluginFieldCollection([lrelu_slope_field])
plugin = plugin_creator.create_plugin(name=plugin_name, field_collection=field_collection, phase=trt.TensorRTPhase.BUILD)
return plugin
def main():
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()
config = builder.create_builder_config()
config.max_workspace_size = 2**20
input_layer = network.add_input(name="input_layer", dtype=trt.float32, shape=(1, 1))
plugin = network.add_plugin_v3(inputs=[input_layer], shape_inputs=[], plugin=get_trt_plugin("MY_PLUGIN", "1", ""))
plugin.get_output(0).name = "outputs"
network.mark_output(plugin.get_output(0))