Appendix: Migrating Python Code from TensorRT 8.x to 10.x#

This page describes how to update Python code when you migrate from TensorRT 8.x to 10.x: paired examples for the name-based tensor API (buffers and I/O), enqueueV3 execution, and build_serialized_network for engine build, followed by lists of Python APIs added and removed in 10.x.

Note

The Python API is not supported on QNX.

Migrating I/O Buffer Allocation to Named Tensors#

TensorRT 10.x replaces the binding-based API with a name-based tensor API. Use num_io_tensors and get_tensor_name() to iterate over I/O tensors, and get_tensor_mode() to check whether a tensor is an input or output.

 1def allocate_buffers(self, engine):
 2'''
 3Allocates all buffers required for an engine; that is, host and device inputs and outputs.
 4'''
 5inputs = []
 6outputs = []
 7bindings = []
 8stream = cuda.Stream()
 9
10# binding is the name of input/output
11for binding in the engine:
12    size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
13    dtype = trt.nptype(engine.get_binding_dtype(binding))
14
15    # Allocate host and device buffers
16    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
17    device_mem = cuda.mem_alloc(host_mem.nbytes)
18
19    # Append the device buffer address to device bindings.
20    # When cast to int, it's a linear index into the context's memory (like memory address).
21    bindings.append(int(device_mem))
22
23    # Append to the appropriate input/output list.
24    if engine.binding_is_input(binding):
25        inputs.append(self.HostDeviceMem(host_mem, device_mem))
26    else:
27        outputs.append(self.HostDeviceMem(host_mem, device_mem))
28
29return inputs, outputs, bindings, stream
 1def allocate_buffers(self, engine):
 2'''
 3Allocates all buffers required for an engine; that is, host and device inputs and outputs.
 4'''
 5inputs = []
 6outputs = []
 7bindings = []
 8stream = cuda.Stream()
 9
10for i in range(engine.num_io_tensors):
11    tensor_name = engine.get_tensor_name(i)
12    size = trt.volume(engine.get_tensor_shape(tensor_name))
13    dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))
14
15    # Allocate host and device buffers
16    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
17    device_mem = cuda.mem_alloc(host_mem.nbytes)
18
19    # Append the device buffer address to device bindings.
20    # When cast to int, it's a linear index into the context's memory (like memory address).
21    bindings.append(int(device_mem))
22
23    # Append to the appropriate input/output list.
24    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
25        inputs.append(self.HostDeviceMem(host_mem, device_mem))
26    else:
27        outputs.append(self.HostDeviceMem(host_mem, device_mem))
28
29return inputs, outputs, bindings, stream

Summary of Changes#

  • Changed from binding-based iteration to name-based API using num_io_tensors and get_tensor_name()

  • Replaced binding_is_input() with get_tensor_mode() to check I/O mode

Note

The HostDeviceMem helper class used in the examples above is a simple container that pairs host (CPU) and device (GPU) memory allocations. It is not part of the TensorRT API. A minimal implementation is:

from collections import namedtuple

HostDeviceMem = namedtuple("HostDeviceMem", ["host", "device"])

Migrating from enqueueV2 to enqueueV3 (Python)#

The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same inference task. In TensorRT 10.x, enqueueV3 replaces enqueueV2: call set_tensor_address for each I/O tensor before execute_async_v3, as shown in the After tab.

 1# Allocate device memory for inputs.
 2d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]
 3
 4# Allocate device memory for outputs.
 5h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
 6d_output = cuda.mem_alloc(h_output.nbytes)
 7
 8# Transfer data from host to device.
 9cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
10cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
11cuda.memcpy_htod_async(d_inputs[2], input_c, stream)
12
13# Run inference
14context.execute_async_v2(bindings=[int(d_inp) for d_inp in d_inputs] + [int(d_output)], stream_handle=stream.handle)
15
16# Synchronize the stream
17stream.synchronize()
 1# Allocate device memory for inputs.
 2d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]
 3
 4# Allocate device memory for outputs.
 5h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
 6d_output = cuda.mem_alloc(h_output.nbytes)
 7
 8# Transfer data from host to device.
 9cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
10cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
11cuda.memcpy_htod_async(d_inputs[2], input_c, stream)
12
13# Setup tensor address
14bindings = [int(d_inputs[i]) for i in range(3)] + [int(d_output)]
15
16for i in range(engine.num_io_tensors):
17    context.set_tensor_address(engine.get_tensor_name(i), bindings[i])
18
19# Run inference
20context.execute_async_v3(stream_handle=stream.handle)
21
22# Synchronize the stream
23stream.synchronize()

Summary of Changes#

  • Added explicit tensor address setup using set_tensor_address() with tensor names

  • Changed from execute_async_v2() to execute_async_v3()

  • The bindings parameter is no longer passed to execute_async_v3(); tensor addresses must be set beforehand

Migrating Engine Builds to build_serialized_network#

The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same engine build path. In TensorRT 10.x, build_serialized_network() is the standard build path and is always available. Omit the build_engine() / serialize() fallback from the 8.x sample, and check for a None return instead of catching AttributeError, as shown in the After tab.

1engine_bytes = None
2try:
3    engine_bytes = self.builder.build_serialized_network(self.network, self.config)
4except AttributeError:
5    engine = self.builder.build_engine(self.network, self.config)
6    engine_bytes = engine.serialize()
7    del engine
8assert engine_bytes
1engine_bytes = self.builder.build_serialized_network(self.network, self.config)
2if engine_bytes is None:
3    log.error("Failed to create engine")
4    sys.exit(1)

Summary of Changes#

  • The fallback to build_engine() and serialize() is no longer needed in TensorRT 10.x

  • build_serialized_network() is now the standard method and always available

  • Error handling should check for None return value instead of catching AttributeError

Python APIs Added in 10.x#

The following Python APIs have been added in TensorRT 10.x to support new features and improved functionality.

Types#

  • APILanguage

  • ExecutionContextAllocationStrategy

  • IGpuAsyncAllocator

  • InterfaceInfo

  • IPluginResource

  • IPluginV3

  • IStreamReader

  • IVersionedInterface

Methods and Properties#

  • ICudaEngine.is_debug_tensor()

  • ICudaEngine.minimum_weight_streaming_budget

  • ICudaEngine.streamable_weights_size

  • ICudaEngine.weight_streaming_budget

  • IExecutionContext.get_debug_listener()

  • IExecutionContext.get_debug_state()

  • IExecutionContext.set_all_tensors_debug_state()

  • IExecutionContext.set_debug_listener()

  • IExecutionContext.set_tensor_debug_state()

  • IExecutionContext.update_device_memory_size_for_shapes()

  • IGpuAllocator.allocate_async()

  • IGpuAllocator.deallocate_async()

  • INetworkDefinition.add_plugin_v3()

  • INetworkDefinition.is_debug_tensor()

  • INetworkDefinition.mark_debug()

  • INetworkDefinition.unmark_debug()

  • IPluginRegistry.acquire_plugin_resource()

  • IPluginRegistry.all_creators

  • IPluginRegistry.deregister_creator()

  • IPluginRegistry.get_creator()

  • IPluginRegistry.register_creator()

  • IPluginRegistry.release_plugin_resource()

Removed Python APIs and Replacements#

Warning

The APIs listed below have been removed in TensorRT 10.x and will cause runtime errors if called. Review each entry for its replacement before upgrading.

Removed API

Replacement

BuilderFlag.ENABLE_TACTIC_HEURISTIC

Builder optimization level 2.

BuilderFlag.STRICT_TYPES

Use BuilderFlag.DIRECT_IO, BuilderFlag.PREFER_PRECISION_CONSTRAINTS, and BuilderFlag.REJECT_EMPTY_ALGORITHMS.

EngineCapability.DEFAULT

EngineCapability.STANDARD

EngineCapability.kSAFE_DLA

EngineCapability.DLA_STANDALONE

EngineCapability.SAFE_GPU

EngineCapability.SAFETY

IAlgorithmIOInfo.tensor_format

Strides, data type, and vectorization information are sufficient to identify tensor formats uniquely.

IBuilder.max_batch_size

Implicit batch support was removed.

IBuilderConfig.max_workspace_size

IBuilderConfig.set_memory_pool_limit() or get_memory_pool_limit() with MemoryPoolType.WORKSPACE

IBuilderConfig.min_timing_iterations

IBuilderConfig.avg_timing_iterations

ICudaEngine.binding_is_input()

ICudaEngine.get_tensor_mode()

ICudaEngine.get_binding_bytes_per_component()

ICudaEngine.get_tensor_bytes_per_component()

ICudaEngine.get_binding_components_per_element()

ICudaEngine.get_tensor_components_per_element()

ICudaEngine.get_binding_dtype()

ICudaEngine.get_tensor_dtype()

ICudaEngine.get_binding_format()

ICudaEngine.get_tensor_format()

ICudaEngine.get_binding_format_desc()

ICudaEngine.get_tensor_format_desc()

ICudaEngine.get_binding_index()

No name-based equivalent replacement.

ICudaEngine.get_binding_name()

No name-based equivalent replacement.

ICudaEngine.get_binding_shape()

ICudaEngine.get_tensor_shape()

ICudaEngine.get_binding_vectorized_dim()

ICudaEngine.get_tensor_vectorized_dim()

ICudaEngine.get_location()

ITensor.location

ICudaEngine.get_profile_shape()

ICudaEngine.get_tensor_profile_shape()

ICudaEngine.get_profile_shape_input()

ICudaEngine.get_tensor_profile_values()

ICudaEngine.has_implicit_batch_dimension()

Implicit batch is no longer supported.

ICudaEngine.is_execution_binding()

No name-based equivalent replacement.

ICudaEngine.is_shape_binding()

ICudaEngine.is_shape_inference_io()

ICudaEngine.max_batch_size()

Implicit batch is no longer supported.

ICudaEngine.num_bindings()

ICudaEngine.num_io_tensors()

IExecutionContext.get_binding_shape()

IExecutionContext.get_tensor_shape()

IExecutionContext.get_strides()

IExecutionContext.get_tensor_strides()

IExecutionContext.set_binding_shape()

IExecutionContext.set_input_shape()

IFullyConnectedLayer

IMatrixMultiplyLayer

INetworkDefinition.add_convolution()

INetworkDefinition.add_convolution_nd()

INetworkDefinition.add_deconvolution()

INetworkDefinition.add_deconvolution_nd()

INetworkDefinition.add_fully_connected()

INetworkDefinition.add_matrix_multiply()

INetworkDefinition.add_padding()

INetworkDefinition.add_padding_nd()

INetworkDefinition.add_pooling()

INetworkDefinition.add_pooling_nd()

INetworkDefinition.add_rnn_v2()

INetworkDefinition.add_loop()

INetworkDefinition.has_explicit_precision

Explicit precision support was removed in 10.0.

INetworkDefinition.has_implicit_batch_dimension

Implicit batch support was removed.

IRNNv2Layer

ILoop

NetworkDefinitionCreationFlag.EXPLICIT_BATCH

Support was removed in 10.0.

NetworkDefinitionCreationFlag.EXPLICIT_PRECISION

Support was removed in 10.0.

PaddingMode.CAFFE_ROUND_DOWN

Caffe support was removed.

PaddingMode.CAFFE_ROUND_UP

Caffe support was removed.

PreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805

External tactics are always disabled for core code.

PreviewFeature.FASTER_DYNAMIC_SHAPES_0805

This flag is on by default.

ProfilingVerbosity.DEFAULT

ProfilingVerbosity.LAYER_NAMES_ONLY

ProfilingVerbosity.VERBOSE

ProfilingVerbosity.DETAILED

ResizeMode

Use InterpolationMode. Alias was removed.

SampleMode.DEFAULT

SampleMode.STRICT_BOUNDS

SliceMode

Use SampleMode. Alias was removed.