Migrating from TensorRT 4 to 5¶
TensorRT 5.0 includes an all new Python API. The python bindings have been entirely rewritten, and significant changes and improvements were made. This page highlights some of these changes and outlines the steps you can take to migrate your existing Python code to TensorRT 5.
It is possible to continue using the legacy API by importing the TensorRT legacy package.
import tensorrt as trt (for example) with
import tensorrt.legacy as trt in your scripts.
Eventually, legacy support will be dropped, so it is still advisable to migrate to the new API.
The new API removes the
lite submodules. Instead, all functionality is now included in the top-level
Create and Destroy Functions¶
The new API no longer includes create and destroy functions. Constructors are used for object creation and
destruction is instead handling by scoping (that is, when an object goes out of scope,
the garbage collecter can free it). It is also possible to destroy objects manually with
del trt_object. Note that
del only decrements the reference count of the object
- it is still up to the garbage collecter to free it. If you really want to free memory immediately, you can instead
invoke the object’s
The above changes apply to the following classes:
|Legacy API||New API|
It is generally advisable to scope objects using
with ... as ... clauses.
TensorRT classes implement
__exit__ in such a way that
instances are destroyed immediately. Therefore, unlike the
with ... as ... clauses free memory immediately when a TensorRT object goes out of scope.
For example, building an engine from an ONNX file using
with ... as ... might look something like this:
import tensorrt as trt TRT_LOGGER = trt.Logger(trt.Logger.WARNING) ONNX_MODEL = "mnist.onnx" def build_engine(): with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser: # Configure the builder here. builder.max_workspace_size = 2**30 # Parse the model to create a network. with open(ONNX_MODEL, 'rb') as model: parser.parse(model.read()) # Build and return the engine. Note that the builder, network and parser are destroyed when this function returns. return builder.build_cuda_engine(network)
Data types have been given short-hand aliases to better align with naming conventions established by Python libraries such as NumPy.
|Legacy API||New API|
Getters and Setters¶
Where possible, the new API exposes attributes rather than getters and setters. Importantly, getters that take an argument, and setters that either take multiple arguments or have a return value have been preserved, as these do not correspond directly to a single object attribute.
In general, attribute and function names align with NumPy naming conventions.
shape is used in place of
dtype instead of
nbytes instead of
num instead of
nb and so on.
In most cases, it is enough to simply drop the
|Legacy API||New API|
In other cases, you can apply the aforementioned rules:
|Legacy API||New API|
tensorrt.Permutation Behave Like Iterables¶
Where previously you would have written code like this:
# Legacy API dims = network.get_input(0).get_dimensions().to_DimsCHW() # Need to explicitly convert to DimsCHW. arr = np.random.randint(5, size=(dims.C(), dims.H(), dims.W())) # Cannot directly use Dims where iterables are expected. dims2 = engine.get_binding_dimensions(1).to_DimsCHW() # Compute volume. This will differ between different subclasses of Dims. elt_count = dims2.vol() # This will not work for DimsHW or DimsNCHW, for example. # Or equivalently, elt_count = dims2.C() * dims2.H() * dims2.W()
It is now possible to write:
# New API dims = network.get_input(0).shape # All Dims subclasses behave like iterables, so we don't care about the exact subclass. arr = np.random.randint(5, size=dims) # Dims act exactly like any other iterable. dims2 = engine.get_binding_shape(1) # Compute volume. Works for any iterable, including lists and tuples. elt_count = trt.volume(dims2) # Or equivalently, elt_count = reduce(lambda x, y: x * y, dims2)
tensorrt.Weights class would perform deep-copies of any buffers used to create weights.
To better align with the C++ API, and for the sake of efficiency, the new bindings no longer create
these deep copies, but instead increment the reference count of the existing buffer.
Therefore, modifying the buffer used to create a
will also modify the
Note that the
tensorrt.ICudaEngine will still create its own copies of weights internally.
The above only applies to
tensorrt.Weights created before engine construction (when using the Network API, for example).
tensorrt.Weights Behave like NumPy Arrays¶
tensorrt.Weights class is now interchangeable with NumPy arrays.
More concretely, this means that NumPy arrays can be used directly wherever
tensorrt.Weights objects are expected - no explicit conversion required.
Furthermore, the API now returns NumPy arrays instead of Weights objects.
tensorrt.CaffeParser Returns NumPy Arrays¶
parse_binary_proto function of the
tensorrt.CaffeParser now returns NumPy arrays instead of
enqueue Is Now
To better match CUDA naming conventions,
enqueue function has been renamed to
Keyword Arguments and Default Parameters¶
All functions and constructors in the new API now use named parameters. Additionally, they include default arguments where possible to better mirror the C++ API.
Where previously, you had to order function parameters correctly:
# Legacy API context.enqueue(batch_size, bindings, stream.handle, None)
You can now use keyword arguments instead:
# New API context.execute_async(stream_handle=stream.handle, bindings=bindings, batch_size=batch_size)
Serializing and Deserializing Engines¶
tensorrt.IHostMemory class now implements Python’s buffer interface.
This means that serialized engines can be treated like any other Python buffer.
Serializing An Engine¶
# New API with open("sample.engine", "wb") as f: f.write(engine.serialize())
Deserializing An Engine¶
# New API with open("sample.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime: engine = runtime.deserialize_cuda_engine(f.read())
SWIG-based wrappers are incompatible with TensorRT’s pybind11-based bindings. In order to migrate existing plugins, you need to write a pybind11-based wrapper for the plugin. Typically, this involves writing bindings for the plugin itself, as well as the PluginFactory. For more details, refer to the fc_plugin_caffe_mnist sample.
You can refer to the Pybind11 User Guide for detailed information on pybind11 usage.
Using the Plugin Registry¶
Alternatively, it is possible to bypass the need for bindings completely by using the plugin registry. This involves two steps:
- Build a shared object file for the plugin. Make sure the
REGISTER_TENSORRT_PLUGINmacro is used in the plugin implementation.
- In python, load the file created above using
At this point, you can add plugins to your network using the
which can be retrieved with
tensorrt.get_plugin_registry(). For more details, refer to the uff_custom_plugin sample.