Migrating Python Code from TensorRT 8.x to 10.x#

This page describes how to update Python code when you migrate from TensorRT 8.x to 10.x. Subsections below pair 8.x and 10.x code for typical tasks—the name-based tensor API (buffers and I/O), enqueueV3 execution, and build_serialized_network for engine build—and later sections list Python APIs added and removed in 10.x.

See also

Migration Guide Overview

Landing page with links to all migration surfaces.

Python API Reference

Full Python API documentation.

Python API Tutorial

Python API usage guide with code examples.

Note

The Python API is not supported on QNX.

Migrating I/O Buffer Allocation to Named Tensors#

Before (TensorRT 8.x)#

 1def allocate_buffers(self, engine):
 2'''
 3Allocates all buffers required for an engine; that is, host and device inputs and outputs.
 4'''
 5inputs = []
 6outputs = []
 7bindings = []
 8stream = cuda.Stream()
 9
10# binding is the name of input/output
11for binding in the engine:
12    size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
13    dtype = trt.nptype(engine.get_binding_dtype(binding))
14
15    # Allocate host and device buffers
16    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (will not be swapped to disk)
17    device_mem = cuda.mem_alloc(host_mem.nbytes)
18
19    # Append the device buffer address to device bindings.
20    # When cast to int, it is a linear index into the context's memory (like memory address).
21    bindings.append(int(device_mem))
22
23    # Append to the appropriate input/output list.
24    if engine.binding_is_input(binding):
25        inputs.append(self.HostDeviceMem(host_mem, device_mem))
26    else:
27        outputs.append(self.HostDeviceMem(host_mem, device_mem))
28
29return inputs, outputs, bindings, stream

After (TensorRT 10.x)#

 1def allocate_buffers(self, engine):
 2'''
 3Allocates all buffers required for an engine; that is, host and device inputs and outputs.
 4'''
 5inputs = []
 6outputs = []
 7bindings = []
 8stream = cuda.Stream()
 9
10for i in range(engine.num_io_tensors):
11    tensor_name = engine.get_tensor_name(i)
12    size = trt.volume(engine.get_tensor_shape(tensor_name))
13    dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))
14
15    # Allocate host and device buffers
16    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (will not be swapped to disk)
17    device_mem = cuda.mem_alloc(host_mem.nbytes)
18
19    # Append the device buffer address to device bindings.
20    # When cast to int, it is a linear index into the context's memory (like memory address).
21    bindings.append(int(device_mem))
22
23    # Append to the appropriate input/output list.
24    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
25        inputs.append(self.HostDeviceMem(host_mem, device_mem))
26    else:
27        outputs.append(self.HostDeviceMem(host_mem, device_mem))
28
29return inputs, outputs, bindings, stream

Summary of Changes#

  • Changed from binding-based iteration to name-based API using num_io_tensors and get_tensor_name()

  • Replaced binding_is_input() with get_tensor_mode() to check I/O mode

Note

The HostDeviceMem helper class used in the examples above is a simple container that pairs host (CPU) and device (GPU) memory allocations. It is not part of the TensorRT API. A minimal implementation is:

from collections import namedtuple

HostDeviceMem = namedtuple("HostDeviceMem", ["host", "device"])

Migrating from enqueueV2 to enqueueV3 (Python)#

The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same inference task.

Before (TensorRT 8.x)#

 1# Allocate device memory for inputs.
 2d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]
 3
 4# Allocate device memory for outputs.
 5h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
 6d_output = cuda.mem_alloc(h_output.nbytes)
 7
 8# Transfer data from host to device.
 9cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
10cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
11cuda.memcpy_htod_async(d_inputs[2], input_c, stream)
12
13# Run inference
14context.execute_async_v2(bindings=[int(d_inp) for d_inp in d_inputs] + [int(d_output)], stream_handle=stream.handle)
15
16# Synchronize the stream
17stream.synchronize()

Warning

In TensorRT 10.x, enqueueV3 replaces enqueueV2. You must call set_tensor_address for each I/O tensor before execute_async_v3. Passing bindings directly to the execution call is no longer supported.

The After sample shows this pattern.

After (TensorRT 10.x)#

 1# Allocate device memory for inputs.
 2d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]
 3
 4# Allocate device memory for outputs.
 5h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
 6d_output = cuda.mem_alloc(h_output.nbytes)
 7
 8# Transfer data from host to device.
 9cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
10cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
11cuda.memcpy_htod_async(d_inputs[2], input_c, stream)
12
13# Setup tensor address
14bindings = [int(d_inputs[i]) for i in range(3)] + [int(d_output)]
15
16for i in range(engine.num_io_tensors):
17    context.set_tensor_address(engine.get_tensor_name(i), bindings[i])
18
19# Run inference
20context.execute_async_v3(stream_handle=stream.handle)
21
22# Synchronize the stream
23stream.synchronize()

Summary of Changes#

  • Added explicit tensor address setup using set_tensor_address() with tensor names

  • Changed from execute_async_v2() to execute_async_v3()

  • The bindings parameter is no longer passed to execute_async_v3(); tensor addresses must be set beforehand

Migrating Engine Builds to build_serialized_network#

The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same engine build path.

Before (TensorRT 8.x)#

1engine_bytes = None
2try:
3    engine_bytes = self.builder.build_serialized_network(self.network, self.config)
4except AttributeError:
5    engine = self.builder.build_engine(self.network, self.config)
6    engine_bytes = engine.serialize()
7    del engine
8assert engine_bytes

Important

In TensorRT 10.x, build_serialized_network() is the standard build path and is always available. The build_engine() / serialize() fallback is no longer needed. Check for a None return instead of catching AttributeError.

The After sample shows this pattern.

After (TensorRT 10.x)#

1engine_bytes = self.builder.build_serialized_network(self.network, self.config)
2if engine_bytes is None:
3    log.error("Failed to create engine")
4    sys.exit(1)

Summary of Changes#

  • The fallback to build_engine() and serialize() is no longer needed in TensorRT 10.x

  • build_serialized_network() is now the standard method and always available

  • Error handling should check for None return value instead of catching AttributeError

Python APIs Added in 10.x#

The following Python APIs have been added in TensorRT 10.x to support new features and improved functionality.

Types#

  • APILanguage

  • ExecutionContextAllocationStrategy

  • IGpuAsyncAllocator

  • InterfaceInfo

  • IPluginResource

  • IPluginV3

  • IStreamReader

  • IVersionedInterface

Methods and Properties#

  • ICudaEngine.is_debug_tensor()

  • ICudaEngine.minimum_weight_streaming_budget

  • ICudaEngine.streamable_weights_size

  • ICudaEngine.weight_streaming_budget

  • IExecutionContext.get_debug_listener()

  • IExecutionContext.get_debug_state()

  • IExecutionContext.set_all_tensors_debug_state()

  • IExecutionContext.set_debug_listener()

  • IExecutionContext.set_tensor_debug_state()

  • IExecutionContext.update_device_memory_size_for_shapes()

  • IGpuAllocator.allocate_async()

  • IGpuAllocator.deallocate_async()

  • INetworkDefinition.add_plugin_v3()

  • INetworkDefinition.is_debug_tensor()

  • INetworkDefinition.mark_debug()

  • INetworkDefinition.unmark_debug()

  • IPluginRegistry.acquire_plugin_resource()

  • IPluginRegistry.all_creators

  • IPluginRegistry.deregister_creator()

  • IPluginRegistry.get_creator()

  • IPluginRegistry.register_creator()

  • IPluginRegistry.release_plugin_resource()

Removed Python APIs and Replacements#

Warning

The APIs listed below have been removed in TensorRT 10.x and will cause runtime errors if called. Review each entry for its replacement before upgrading.

The following Python APIs have been removed. Each entry shows the removed API and its replacement or migration path.

BuilderFlag.ENABLE_TACTIC_HEURISTIC

Builder optimization level 2

BuilderFlag.STRICT_TYPES

Use: BuilderFlag.DIRECT_IO, BuilderFlag.PREFER_PRECISION_CONSTRAINTS, and BuilderFlag.REJECT_EMPTY_ALGORITHMS

EngineCapability.DEFAULT

EngineCapability.STANDARD

EngineCapability.kSAFE_DLA

EngineCapability.DLA_STANDALONE

EngineCapability.SAFE_GPU

EngineCapability.SAFETY

IAlgorithmIOInfo.tensor_format

Strides, data type, and vectorization information are sufficient to identify tensor formats uniquely.

IBuilder.max_batch_size

Implicit batch support was removed

IBuilderConfig.max_workspace_size

IBuilderConfig.set_memory_pool_limit() or get_memory_pool_limit() with MemoryPoolType.WORKSPACE

IBuilderConfig.min_timing_iterations

IBuilderConfig.avg_timing_iterations

ICudaEngine.binding_is_input()

ICudaEngine.get_tensor_mode()

ICudaEngine.get_binding_bytes_per_component()

ICudaEngine.get_tensor_bytes_per_component()

ICudaEngine.get_binding_components_per_element()

ICudaEngine.get_tensor_components_per_element()

ICudaEngine.get_binding_dtype()

ICudaEngine.get_tensor_dtype()

ICudaEngine.get_binding_format()

ICudaEngine.get_tensor_format()

ICudaEngine.get_binding_format_desc()

ICudaEngine.get_tensor_format_desc()

ICudaEngine.get_binding_index()

No name-based equivalent replacement

ICudaEngine.get_binding_name()

No name-based equivalent replacement

ICudaEngine.get_binding_shape()

ICudaEngine.get_tensor_shape()

ICudaEngine.get_binding_vectorized_dim()

ICudaEngine.get_tensor_vectorized_dim()

ICudaEngine.get_location()

ITensor.location

ICudaEngine.get_profile_shape()

ICudaEngine.get_tensor_profile_shape()

ICudaEngine.get_profile_shape_input()

ICudaEngine.get_tensor_profile_values()

ICudaEngine.has_implicit_batch_dimension()

Implicit batch is no longer supported

ICudaEngine.is_execution_binding()

No name-based equivalent replacement

ICudaEngine.is_shape_binding()

ICudaEngine.is_shape_inference_io()

ICudaEngine.max_batch_size()

Implicit batch is no longer supported

ICudaEngine.num_bindings()

ICudaEngine.num_io_tensors()

IExecutionContext.get_binding_shape()

IExecutionContext.get_tensor_shape()

IExecutionContext.get_strides()

IExecutionContext.get_tensor_strides()

IExecutionContext.set_binding_shape()

IExecutionContext.set_input_shape()

IFullyConnectedLayer

IMatrixMultiplyLayer

INetworkDefinition.add_convolution()

INetworkDefinition.add_convolution_nd()

INetworkDefinition.add_deconvolution()

INetworkDefinition.add_deconvolution_nd()

INetworkDefinition.add_fully_connected()

INetworkDefinition.add_matrix_multiply()

INetworkDefinition.add_padding()

INetworkDefinition.add_padding_nd()

INetworkDefinition.add_pooling()

INetworkDefinition.add_pooling_nd()

INetworkDefinition.add_rnn_v2()

INetworkDefinition.add_loop()

INetworkDefinition.has_explicit_precision

Explicit precision support was removed in 10.0

INetworkDefinition.has_implicit_batch_dimension

Implicit batch support was removed

IRNNv2Layer

ILoop

NetworkDefinitionCreationFlag.EXPLICIT_BATCH

Support was removed in 10.0

NetworkDefinitionCreationFlag.EXPLICIT_PRECISION

Support was removed in 10.0

PaddingMode.CAFFE_ROUND_DOWN

Caffe support was removed

PaddingMode.CAFFE_ROUND_UP

Caffe support was removed

PreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805

External tactics are always disabled for core code

PreviewFeature.FASTER_DYNAMIC_SHAPES_0805

This flag is on by default

ProfilingVerbosity.DEFAULT

ProfilingVerbosity.LAYER_NAMES_ONLY

ProfilingVerbosity.VERBOSE

ProfilingVerbosity.DETAILED

ResizeMode

Use InterpolationMode. Alias was removed.

SampleMode.DEFAULT

SampleMode.STRICT_BOUNDS

SliceMode

Use SampleMode. Alias was removed.