Migrating Python Code from TensorRT 8.x to 10.x#
This page describes how to update Python code when you migrate from TensorRT 8.x to 10.x. Subsections below pair 8.x and 10.x code for typical tasks—the name-based tensor API (buffers and I/O), enqueueV3 execution, and build_serialized_network for engine build—and later sections list Python APIs added and removed in 10.x.
See also
- Migration Guide Overview
Landing page with links to all migration surfaces.
- Python API Reference
Full Python API documentation.
- Python API Tutorial
Python API usage guide with code examples.
Note
The Python API is not supported on QNX.
Migrating I/O Buffer Allocation to Named Tensors#
Before (TensorRT 8.x)#
1def allocate_buffers(self, engine):
2'''
3Allocates all buffers required for an engine; that is, host and device inputs and outputs.
4'''
5inputs = []
6outputs = []
7bindings = []
8stream = cuda.Stream()
9
10# binding is the name of input/output
11for binding in the engine:
12 size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
13 dtype = trt.nptype(engine.get_binding_dtype(binding))
14
15 # Allocate host and device buffers
16 host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (will not be swapped to disk)
17 device_mem = cuda.mem_alloc(host_mem.nbytes)
18
19 # Append the device buffer address to device bindings.
20 # When cast to int, it is a linear index into the context's memory (like memory address).
21 bindings.append(int(device_mem))
22
23 # Append to the appropriate input/output list.
24 if engine.binding_is_input(binding):
25 inputs.append(self.HostDeviceMem(host_mem, device_mem))
26 else:
27 outputs.append(self.HostDeviceMem(host_mem, device_mem))
28
29return inputs, outputs, bindings, stream
After (TensorRT 10.x)#
1def allocate_buffers(self, engine):
2'''
3Allocates all buffers required for an engine; that is, host and device inputs and outputs.
4'''
5inputs = []
6outputs = []
7bindings = []
8stream = cuda.Stream()
9
10for i in range(engine.num_io_tensors):
11 tensor_name = engine.get_tensor_name(i)
12 size = trt.volume(engine.get_tensor_shape(tensor_name))
13 dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))
14
15 # Allocate host and device buffers
16 host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (will not be swapped to disk)
17 device_mem = cuda.mem_alloc(host_mem.nbytes)
18
19 # Append the device buffer address to device bindings.
20 # When cast to int, it is a linear index into the context's memory (like memory address).
21 bindings.append(int(device_mem))
22
23 # Append to the appropriate input/output list.
24 if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
25 inputs.append(self.HostDeviceMem(host_mem, device_mem))
26 else:
27 outputs.append(self.HostDeviceMem(host_mem, device_mem))
28
29return inputs, outputs, bindings, stream
Summary of Changes#
Changed from binding-based iteration to name-based API using
num_io_tensorsandget_tensor_name()Replaced
binding_is_input()withget_tensor_mode()to check I/O mode
Note
The HostDeviceMem helper class used in the examples above is a simple container that pairs host (CPU) and device (GPU) memory allocations. It is not part of the TensorRT API. A minimal implementation is:
from collections import namedtuple
HostDeviceMem = namedtuple("HostDeviceMem", ["host", "device"])
Migrating from enqueueV2 to enqueueV3 (Python)#
The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same inference task.
Before (TensorRT 8.x)#
1# Allocate device memory for inputs.
2d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]
3
4# Allocate device memory for outputs.
5h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
6d_output = cuda.mem_alloc(h_output.nbytes)
7
8# Transfer data from host to device.
9cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
10cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
11cuda.memcpy_htod_async(d_inputs[2], input_c, stream)
12
13# Run inference
14context.execute_async_v2(bindings=[int(d_inp) for d_inp in d_inputs] + [int(d_output)], stream_handle=stream.handle)
15
16# Synchronize the stream
17stream.synchronize()
Warning
In TensorRT 10.x, enqueueV3 replaces enqueueV2. You must call set_tensor_address for each I/O tensor before execute_async_v3. Passing bindings directly to the execution call is no longer supported.
The After sample shows this pattern.
After (TensorRT 10.x)#
1# Allocate device memory for inputs.
2d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]
3
4# Allocate device memory for outputs.
5h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
6d_output = cuda.mem_alloc(h_output.nbytes)
7
8# Transfer data from host to device.
9cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
10cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
11cuda.memcpy_htod_async(d_inputs[2], input_c, stream)
12
13# Setup tensor address
14bindings = [int(d_inputs[i]) for i in range(3)] + [int(d_output)]
15
16for i in range(engine.num_io_tensors):
17 context.set_tensor_address(engine.get_tensor_name(i), bindings[i])
18
19# Run inference
20context.execute_async_v3(stream_handle=stream.handle)
21
22# Synchronize the stream
23stream.synchronize()
Summary of Changes#
Added explicit tensor address setup using
set_tensor_address()with tensor namesChanged from
execute_async_v2()toexecute_async_v3()The bindings parameter is no longer passed to
execute_async_v3(); tensor addresses must be set beforehand
Migrating Engine Builds to build_serialized_network#
The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same engine build path.
Before (TensorRT 8.x)#
1engine_bytes = None
2try:
3 engine_bytes = self.builder.build_serialized_network(self.network, self.config)
4except AttributeError:
5 engine = self.builder.build_engine(self.network, self.config)
6 engine_bytes = engine.serialize()
7 del engine
8assert engine_bytes
Important
In TensorRT 10.x, build_serialized_network() is the standard build path and is always available. The build_engine() / serialize() fallback is no longer needed. Check for a None return instead of catching AttributeError.
The After sample shows this pattern.
After (TensorRT 10.x)#
1engine_bytes = self.builder.build_serialized_network(self.network, self.config)
2if engine_bytes is None:
3 log.error("Failed to create engine")
4 sys.exit(1)
Summary of Changes#
The fallback to
build_engine()andserialize()is no longer needed in TensorRT 10.xbuild_serialized_network()is now the standard method and always availableError handling should check for
Nonereturn value instead of catchingAttributeError
Python APIs Added in 10.x#
The following Python APIs have been added in TensorRT 10.x to support new features and improved functionality.
Types#
APILanguageExecutionContextAllocationStrategyIGpuAsyncAllocatorInterfaceInfoIPluginResourceIPluginV3IStreamReaderIVersionedInterface
Methods and Properties#
ICudaEngine.is_debug_tensor()ICudaEngine.minimum_weight_streaming_budgetICudaEngine.streamable_weights_sizeICudaEngine.weight_streaming_budgetIExecutionContext.get_debug_listener()IExecutionContext.get_debug_state()IExecutionContext.set_all_tensors_debug_state()IExecutionContext.set_debug_listener()IExecutionContext.set_tensor_debug_state()IExecutionContext.update_device_memory_size_for_shapes()IGpuAllocator.allocate_async()IGpuAllocator.deallocate_async()INetworkDefinition.add_plugin_v3()INetworkDefinition.is_debug_tensor()INetworkDefinition.mark_debug()INetworkDefinition.unmark_debug()IPluginRegistry.acquire_plugin_resource()IPluginRegistry.all_creatorsIPluginRegistry.deregister_creator()IPluginRegistry.get_creator()IPluginRegistry.register_creator()IPluginRegistry.release_plugin_resource()
Removed Python APIs and Replacements#
Warning
The APIs listed below have been removed in TensorRT 10.x and will cause runtime errors if called. Review each entry for its replacement before upgrading.
The following Python APIs have been removed. Each entry shows the removed API and its replacement or migration path.
BuilderFlag.ENABLE_TACTIC_HEURISTICBuilder optimization level 2
BuilderFlag.STRICT_TYPESUse:
BuilderFlag.DIRECT_IO,BuilderFlag.PREFER_PRECISION_CONSTRAINTS, andBuilderFlag.REJECT_EMPTY_ALGORITHMSEngineCapability.DEFAULTEngineCapability.STANDARDEngineCapability.kSAFE_DLAEngineCapability.DLA_STANDALONEEngineCapability.SAFE_GPUEngineCapability.SAFETYIAlgorithmIOInfo.tensor_formatStrides, data type, and vectorization information are sufficient to identify tensor formats uniquely.
IBuilder.max_batch_sizeImplicit batch support was removed
IBuilderConfig.max_workspace_sizeIBuilderConfig.set_memory_pool_limit()orget_memory_pool_limit()withMemoryPoolType.WORKSPACEIBuilderConfig.min_timing_iterationsIBuilderConfig.avg_timing_iterationsICudaEngine.binding_is_input()ICudaEngine.get_tensor_mode()ICudaEngine.get_binding_bytes_per_component()ICudaEngine.get_tensor_bytes_per_component()ICudaEngine.get_binding_components_per_element()ICudaEngine.get_tensor_components_per_element()ICudaEngine.get_binding_dtype()ICudaEngine.get_tensor_dtype()ICudaEngine.get_binding_format()ICudaEngine.get_tensor_format()ICudaEngine.get_binding_format_desc()ICudaEngine.get_tensor_format_desc()ICudaEngine.get_binding_index()No name-based equivalent replacement
ICudaEngine.get_binding_name()No name-based equivalent replacement
ICudaEngine.get_binding_shape()ICudaEngine.get_tensor_shape()ICudaEngine.get_binding_vectorized_dim()ICudaEngine.get_tensor_vectorized_dim()ICudaEngine.get_location()ITensor.locationICudaEngine.get_profile_shape()ICudaEngine.get_tensor_profile_shape()ICudaEngine.get_profile_shape_input()ICudaEngine.get_tensor_profile_values()ICudaEngine.has_implicit_batch_dimension()Implicit batch is no longer supported
ICudaEngine.is_execution_binding()No name-based equivalent replacement
ICudaEngine.is_shape_binding()ICudaEngine.is_shape_inference_io()ICudaEngine.max_batch_size()Implicit batch is no longer supported
ICudaEngine.num_bindings()ICudaEngine.num_io_tensors()IExecutionContext.get_binding_shape()IExecutionContext.get_tensor_shape()IExecutionContext.get_strides()IExecutionContext.get_tensor_strides()IExecutionContext.set_binding_shape()IExecutionContext.set_input_shape()IFullyConnectedLayerIMatrixMultiplyLayerINetworkDefinition.add_convolution()INetworkDefinition.add_convolution_nd()INetworkDefinition.add_deconvolution()INetworkDefinition.add_deconvolution_nd()INetworkDefinition.add_fully_connected()INetworkDefinition.add_matrix_multiply()INetworkDefinition.add_padding()INetworkDefinition.add_padding_nd()INetworkDefinition.add_pooling()INetworkDefinition.add_pooling_nd()INetworkDefinition.add_rnn_v2()INetworkDefinition.add_loop()INetworkDefinition.has_explicit_precisionExplicit precision support was removed in 10.0
INetworkDefinition.has_implicit_batch_dimensionImplicit batch support was removed
IRNNv2LayerILoopNetworkDefinitionCreationFlag.EXPLICIT_BATCHSupport was removed in 10.0
NetworkDefinitionCreationFlag.EXPLICIT_PRECISIONSupport was removed in 10.0
PaddingMode.CAFFE_ROUND_DOWNCaffe support was removed
PaddingMode.CAFFE_ROUND_UPCaffe support was removed
PreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805External tactics are always disabled for core code
PreviewFeature.FASTER_DYNAMIC_SHAPES_0805This flag is on by default
ProfilingVerbosity.DEFAULTProfilingVerbosity.LAYER_NAMES_ONLYProfilingVerbosity.VERBOSEProfilingVerbosity.DETAILEDResizeModeUse
InterpolationMode. Alias was removed.SampleMode.DEFAULTSampleMode.STRICT_BOUNDSSliceModeUse
SampleMode. Alias was removed.