Appendix: Migrating Python Code from TensorRT 8.x to 10.x#
This page describes how to update Python code when you migrate from TensorRT 8.x to 10.x: paired examples for the name-based tensor API (buffers and I/O), enqueueV3 execution, and build_serialized_network for engine build, followed by lists of Python APIs added and removed in 10.x.
Note
The Python API is not supported on QNX.
Migrating I/O Buffer Allocation to Named Tensors#
TensorRT 10.x replaces the binding-based API with a name-based tensor API. Use num_io_tensors and get_tensor_name() to iterate over I/O tensors, and get_tensor_mode() to check whether a tensor is an input or output.
1def allocate_buffers(self, engine):
2'''
3Allocates all buffers required for an engine; that is, host and device inputs and outputs.
4'''
5inputs = []
6outputs = []
7bindings = []
8stream = cuda.Stream()
9
10# binding is the name of input/output
11for binding in the engine:
12 size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
13 dtype = trt.nptype(engine.get_binding_dtype(binding))
14
15 # Allocate host and device buffers
16 host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
17 device_mem = cuda.mem_alloc(host_mem.nbytes)
18
19 # Append the device buffer address to device bindings.
20 # When cast to int, it's a linear index into the context's memory (like memory address).
21 bindings.append(int(device_mem))
22
23 # Append to the appropriate input/output list.
24 if engine.binding_is_input(binding):
25 inputs.append(self.HostDeviceMem(host_mem, device_mem))
26 else:
27 outputs.append(self.HostDeviceMem(host_mem, device_mem))
28
29return inputs, outputs, bindings, stream
1def allocate_buffers(self, engine):
2'''
3Allocates all buffers required for an engine; that is, host and device inputs and outputs.
4'''
5inputs = []
6outputs = []
7bindings = []
8stream = cuda.Stream()
9
10for i in range(engine.num_io_tensors):
11 tensor_name = engine.get_tensor_name(i)
12 size = trt.volume(engine.get_tensor_shape(tensor_name))
13 dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))
14
15 # Allocate host and device buffers
16 host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
17 device_mem = cuda.mem_alloc(host_mem.nbytes)
18
19 # Append the device buffer address to device bindings.
20 # When cast to int, it's a linear index into the context's memory (like memory address).
21 bindings.append(int(device_mem))
22
23 # Append to the appropriate input/output list.
24 if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
25 inputs.append(self.HostDeviceMem(host_mem, device_mem))
26 else:
27 outputs.append(self.HostDeviceMem(host_mem, device_mem))
28
29return inputs, outputs, bindings, stream
Summary of Changes#
Changed from binding-based iteration to name-based API using
num_io_tensorsandget_tensor_name()Replaced
binding_is_input()withget_tensor_mode()to check I/O mode
Note
The HostDeviceMem helper class used in the examples above is a simple container that pairs host (CPU) and device (GPU) memory allocations. It is not part of the TensorRT API. A minimal implementation is:
from collections import namedtuple
HostDeviceMem = namedtuple("HostDeviceMem", ["host", "device"])
Migrating from enqueueV2 to enqueueV3 (Python)#
The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same inference task. In TensorRT 10.x, enqueueV3 replaces enqueueV2: call set_tensor_address for each I/O tensor before execute_async_v3, as shown in the After tab.
1# Allocate device memory for inputs.
2d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]
3
4# Allocate device memory for outputs.
5h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
6d_output = cuda.mem_alloc(h_output.nbytes)
7
8# Transfer data from host to device.
9cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
10cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
11cuda.memcpy_htod_async(d_inputs[2], input_c, stream)
12
13# Run inference
14context.execute_async_v2(bindings=[int(d_inp) for d_inp in d_inputs] + [int(d_output)], stream_handle=stream.handle)
15
16# Synchronize the stream
17stream.synchronize()
1# Allocate device memory for inputs.
2d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]
3
4# Allocate device memory for outputs.
5h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
6d_output = cuda.mem_alloc(h_output.nbytes)
7
8# Transfer data from host to device.
9cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
10cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
11cuda.memcpy_htod_async(d_inputs[2], input_c, stream)
12
13# Setup tensor address
14bindings = [int(d_inputs[i]) for i in range(3)] + [int(d_output)]
15
16for i in range(engine.num_io_tensors):
17 context.set_tensor_address(engine.get_tensor_name(i), bindings[i])
18
19# Run inference
20context.execute_async_v3(stream_handle=stream.handle)
21
22# Synchronize the stream
23stream.synchronize()
Summary of Changes#
Added explicit tensor address setup using
set_tensor_address()with tensor namesChanged from
execute_async_v2()toexecute_async_v3()The bindings parameter is no longer passed to
execute_async_v3(); tensor addresses must be set beforehand
Migrating Engine Builds to build_serialized_network#
The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same engine build path. In TensorRT 10.x, build_serialized_network() is the standard build path and is always available. Omit the build_engine() / serialize() fallback from the 8.x sample, and check for a None return instead of catching AttributeError, as shown in the After tab.
1engine_bytes = None
2try:
3 engine_bytes = self.builder.build_serialized_network(self.network, self.config)
4except AttributeError:
5 engine = self.builder.build_engine(self.network, self.config)
6 engine_bytes = engine.serialize()
7 del engine
8assert engine_bytes
1engine_bytes = self.builder.build_serialized_network(self.network, self.config)
2if engine_bytes is None:
3 log.error("Failed to create engine")
4 sys.exit(1)
Summary of Changes#
The fallback to
build_engine()andserialize()is no longer needed in TensorRT 10.xbuild_serialized_network()is now the standard method and always availableError handling should check for
Nonereturn value instead of catchingAttributeError
Python APIs Added in 10.x#
The following Python APIs have been added in TensorRT 10.x to support new features and improved functionality.
Types#
APILanguageExecutionContextAllocationStrategyIGpuAsyncAllocatorInterfaceInfoIPluginResourceIPluginV3IStreamReaderIVersionedInterface
Methods and Properties#
ICudaEngine.is_debug_tensor()ICudaEngine.minimum_weight_streaming_budgetICudaEngine.streamable_weights_sizeICudaEngine.weight_streaming_budgetIExecutionContext.get_debug_listener()IExecutionContext.get_debug_state()IExecutionContext.set_all_tensors_debug_state()IExecutionContext.set_debug_listener()IExecutionContext.set_tensor_debug_state()IExecutionContext.update_device_memory_size_for_shapes()IGpuAllocator.allocate_async()IGpuAllocator.deallocate_async()INetworkDefinition.add_plugin_v3()INetworkDefinition.is_debug_tensor()INetworkDefinition.mark_debug()INetworkDefinition.unmark_debug()IPluginRegistry.acquire_plugin_resource()IPluginRegistry.all_creatorsIPluginRegistry.deregister_creator()IPluginRegistry.get_creator()IPluginRegistry.register_creator()IPluginRegistry.release_plugin_resource()
Removed Python APIs and Replacements#
Warning
The APIs listed below have been removed in TensorRT 10.x and will cause runtime errors if called. Review each entry for its replacement before upgrading.
Removed API |
Replacement |
|---|---|
|
Builder optimization level 2. |
|
Use |
|
|
|
|
|
|
|
Strides, data type, and vectorization information are sufficient to identify tensor formats uniquely. |
|
Implicit batch support was removed. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
No name-based equivalent replacement. |
|
No name-based equivalent replacement. |
|
|
|
|
|
|
|
|
|
|
|
Implicit batch is no longer supported. |
|
No name-based equivalent replacement. |
|
|
|
Implicit batch is no longer supported. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Explicit precision support was removed in 10.0. |
|
Implicit batch support was removed. |
|
|
|
Support was removed in 10.0. |
|
Support was removed in 10.0. |
|
Caffe support was removed. |
|
Caffe support was removed. |
|
External tactics are always disabled for core code. |
|
This flag is on by default. |
|
|
|
|
|
Use |
|
|
|
Use |