Migrating from TensorRT 4 to 5

TensorRT 5.0 includes an all new Python API. The python bindings have been entirely rewritten, and significant changes and improvements were made. This page highlights some of these changes and outlines the steps you can take to migrate your existing Python code to TensorRT 5.

Legacy Compatibility

It is possible to continue using the legacy API by importing the TensorRT legacy package. Simply replace import tensorrt as trt (for example) with import tensorrt.legacy as trt in your scripts. Eventually, legacy support will be dropped, so it is still advisable to migrate to the new API.


The new API removes the infer, parsers, utils, and lite submodules. Instead, all functionality is now included in the top-level tensorrt module.

Create and Destroy Functions

The new API no longer includes create and destroy functions. Constructors are used for object creation and destruction is instead handling by scoping (that is, when an object goes out of scope, the garbage collecter can free it). It is also possible to destroy objects manually with del trt_object. Note that del only decrements the reference count of the object - it it still up to the garbage collecter to free it. If you really want to free memory immediately, you can instead invoke the object’s __del__ function.

The above changes apply to the following classes:

Legacy API New API
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.WARNING) TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
builder = trt.infer.create_infer_builder(logger) builder = trt.Builder(logger)
runtime = trt.infer.create_infer_runtime(logger) runtime = trt.Runtime(logger)
parser = trt.parsers.caffeparser.create_caffe_parser() parser = trt.CaffeParser()
parser = trt.parsers.uffparser.create_uff_parser() parser = trt.UffParser()
parser.destroy() # Or any other TensorRT object del parser # with … as … clauses strongly preferred (see below)

It is generally advisable to scope objects using with ... as ... clauses. TensorRT classes implement __exit__ in such a way that instances are destroyed immediately. Therefore, unlike the del operator, with ... as ... clauses free memory immediately when a TensorRT object goes out of scope.

For example, building an engine from an ONNX file using with ... as ... might look something like this:

import tensorrt as trt

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
ONNX_MODEL = "mnist.onnx"

def build_engine():
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        # Configure the builder here.
        builder.max_workspace_size = 2**30
        # Parse the model to create a network.
        with open(ONNX_MODEL, 'rb') as model:
        # Build and return the engine. Note that the builder, network and parser are destroyed when this function returns.
        return builder.build_cuda_engine(network)

Data Types

Data types have been given short-hand aliases to better align with naming conventions established by Python libraries such as NumPy.

Legacy API New API
trt.infer.DataType.FLOAT trt.float32
trt.infer.DataType.HALF trt.float16
trt.infer.DataType.INT32 trt.int32
trt.infer.DataType.INT8 trt.int8

Getters and Setters

Where possible, the new API exposes attributes rather than getters and setters. Importantly, getters that take an argument, and setters that either take multiple arguments or have a return value have been preserved, as these do not correspond directly to a single object attribute.

In general, attribute and function names align with NumPy naming conventions. That is, shape is used in place of get_dimensions, dtype instead of get_type, nbytes instead of size, num instead of nb and so on.

In most cases, it is enough to simply drop the get_ or set_ prefix:

Legacy API New API
builder.set_fp16_mode(True) builder.fp16_mode = True
int8_mode = builder.get_int8_mode() int8_mode = builder.int8_mode
builder.set_max_workspace_size(1 << 20) builder.max_workspace_size = 1 << 20

In other cases, you can apply the aforementioned rules:

Legacy API New API
num_inputs = network.get_nb_inputs() num_inputs = network.num_inputs
input_shape = network.get_input(0).get_dimensions() input_shape = network.get_input(0).shape
input_type = network.get_input(0).get_type() input_type = network.get_input(0).dtype

tensorrt.Dims, tensorrt.Permutation Behave Like Iterables

The tensorrt.Dims class, and all its subclasses as well as the tensorrt.Permutation class are interchangeable with Python iterables (such as tuples and lists).

Where previously you would have written code like this:

# Legacy API
dims = network.get_input(0).get_dimensions().to_DimsCHW() # Need to explicitly convert to DimsCHW.
arr = np.random.randint(5, size=(dims.C(), dims.H(), dims.W())) # Cannot directly use Dims where iterables are expected.

dims2 = engine.get_binding_dimensions(1).to_DimsCHW()
# Compute volume. This will differ between different subclasses of Dims.
elt_count = dims2.vol() # This will not work for DimsHW or DimsNCHW, for example.
# Or equivalently,
elt_count = dims2.C() * dims2.H() * dims2.W()

It is now possible to write:

# New API
dims = network.get_input(0).shape # All Dims subclasses behave like iterables, so we don't care about the exact subclass.
arr = np.random.randint(5, size=dims) # Dims act exactly like any other iterable.

dims2 = engine.get_binding_shape(1)
# Compute volume.  Works for any iterable, including lists and tuples.
elt_count = trt.volume(dims2)
# Or equivalently,
elt_count = reduce(lambda x, y: x * y, dims2)

Lightweight tensorrt.Weights

Previously, the tensorrt.Weights class would perform deep-copies of any buffers used to create weights. To better align with the C++ API, and for the sake of efficiency, the new bindings no longer create these deep copies, but instead increment the reference count of the existing buffer. Therefore, modifying the buffer used to create a tensorrt.Weights object will also modify the tensorrt.Weights object.

Note that the tensorrt.ICudaEngine will still create its own copies of weights internally. The above only applies to tensorrt.Weights created before engine construction (when using the Network API, for example).

tensorrt.Weights Behave like NumPy Arrays

The tensorrt.Weights class is now interchangeable with NumPy arrays. More concretely, this means that NumPy arrays can be used directly wherever tensorrt.Weights objects are expected - no explicit conversion required. Furthermore, the API now returns NumPy arrays instead of Weights objects.

tensorrt.CaffeParser Returns NumPy Arrays

The parse_binary_proto function of the tensorrt.CaffeParser now returns NumPy arrays instead of BinaryProtoBlob objects.

enqueue Is Now execute_async

To better match CUDA naming conventions, tensorrt.IExecutionContext ‘s enqueue function has been renamed to execute_async.

Keyword Arguments and Default Parameters

All functions and constructors in the new API now use named parameters. Additionally, they include default arguments where possible to better mirror the C++ API.

Where previously, you had to order function parameters correctly:

# Legacy API
context.enqueue(batch_size, bindings, stream.handle, None)

You can now use keyword arguments instead:

# New API
context.execute_async(stream_handle=stream.handle, bindings=bindings, batch_size=batch_size)

Serializing and Deserializing Engines

The tensorrt.IHostMemory class now implements Python’s buffer interface. This means that serialized engines can be treated like any other Python buffer.

Serializing An Engine

# New API
with open("sample.engine", "wb") as f:

Deserializing An Engine

# New API
with open("sample.engine", "rb") as f:
    engine = runtime.deserialize_cuda_engine(f.read())