Is this page helpful?

Appendix: Migrating Python Code from TensorRT 8.x to 10.x#

This page describes how to update Python code when you migrate from TensorRT 8.x to 10.x: paired examples for the name-based tensor API (buffers and I/O), enqueueV3 execution, and build_serialized_network for engine build, followed by lists of Python APIs added and removed in 10.x.

Note

The Python API is not supported on QNX.

Migrating I/O Buffer Allocation to Named Tensors#

TensorRT 10.x replaces the binding-based API with a name-based tensor API. Use num_io_tensors and get_tensor_name() to iterate over I/O tensors, and get_tensor_mode() to check whether a tensor is an input or output.

Before (TensorRT 8.x)

def allocate_buffers(self, engine):
'''
Allocates all buffers required for an engine; that is, host and device inputs and outputs.
'''
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()

# binding is the name of input/output
for binding in the engine:
    size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
    dtype = trt.nptype(engine.get_binding_dtype(binding))

    # Allocate host and device buffers
    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
    device_mem = cuda.mem_alloc(host_mem.nbytes)

    # Append the device buffer address to device bindings.
    # When cast to int, it's a linear index into the context's memory (like memory address).
    bindings.append(int(device_mem))

    # Append to the appropriate input/output list.
    if engine.binding_is_input(binding):
        inputs.append(self.HostDeviceMem(host_mem, device_mem))
    else:
        outputs.append(self.HostDeviceMem(host_mem, device_mem))

return inputs, outputs, bindings, stream

After (TensorRT 10.x)

def allocate_buffers(self, engine):
'''
Allocates all buffers required for an engine; that is, host and device inputs and outputs.
'''
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()

for i in range(engine.num_io_tensors):
    tensor_name = engine.get_tensor_name(i)
    size = trt.volume(engine.get_tensor_shape(tensor_name))
    dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))

    # Allocate host and device buffers
    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
    device_mem = cuda.mem_alloc(host_mem.nbytes)

    # Append the device buffer address to device bindings.
    # When cast to int, it's a linear index into the context's memory (like memory address).
    bindings.append(int(device_mem))

    # Append to the appropriate input/output list.
    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
        inputs.append(self.HostDeviceMem(host_mem, device_mem))
    else:
        outputs.append(self.HostDeviceMem(host_mem, device_mem))

return inputs, outputs, bindings, stream

Summary of Changes#

Changed from binding-based iteration to name-based API using num_io_tensors and get_tensor_name()
Replaced binding_is_input() with get_tensor_mode() to check I/O mode

Note

The HostDeviceMem helper class used in the examples above is a simple container that pairs host (CPU) and device (GPU) memory allocations. It is not part of the TensorRT API. A minimal implementation is:

from collections import namedtuple

HostDeviceMem = namedtuple("HostDeviceMem", ["host", "device"])

Migrating from `enqueueV2` to `enqueueV3` (Python)#

The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same inference task. In TensorRT 10.x, enqueueV3 replaces enqueueV2: call set_tensor_address for each I/O tensor before execute_async_v3, as shown in the After tab.

Before (TensorRT 8.x)

# Allocate device memory for inputs.
d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]

# Allocate device memory for outputs.
h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
d_output = cuda.mem_alloc(h_output.nbytes)

# Transfer data from host to device.
cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
cuda.memcpy_htod_async(d_inputs[2], input_c, stream)

# Run inference
context.execute_async_v2(bindings=[int(d_inp) for d_inp in d_inputs] + [int(d_output)], stream_handle=stream.handle)

# Synchronize the stream
stream.synchronize()

After (TensorRT 10.x)

# Allocate device memory for inputs.
d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]

# Allocate device memory for outputs.
h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
d_output = cuda.mem_alloc(h_output.nbytes)

# Transfer data from host to device.
cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
cuda.memcpy_htod_async(d_inputs[2], input_c, stream)

# Setup tensor address
bindings = [int(d_inputs[i]) for i in range(3)] + [int(d_output)]

for i in range(engine.num_io_tensors):
    context.set_tensor_address(engine.get_tensor_name(i), bindings[i])

# Run inference
context.execute_async_v3(stream_handle=stream.handle)

# Synchronize the stream
stream.synchronize()

Summary of Changes#

Added explicit tensor address setup using set_tensor_address() with tensor names
Changed from execute_async_v2() to execute_async_v3()
The bindings parameter is no longer passed to execute_async_v3(); tensor addresses must be set beforehand

Migrating Engine Builds to `build_serialized_network`#

The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same engine build path. In TensorRT 10.x, build_serialized_network() is the standard build path and is always available. Omit the build_engine() / serialize() fallback from the 8.x sample, and check for a None return instead of catching AttributeError, as shown in the After tab.

Before (TensorRT 8.x)

engine_bytes = None
try:
    engine_bytes = self.builder.build_serialized_network(self.network, self.config)
except AttributeError:
    engine = self.builder.build_engine(self.network, self.config)
    engine_bytes = engine.serialize()
    del engine
assert engine_bytes

After (TensorRT 10.x)

engine_bytes = self.builder.build_serialized_network(self.network, self.config)
if engine_bytes is None:
    log.error("Failed to create engine")
    sys.exit(1)

Summary of Changes#

The fallback to build_engine() and serialize() is no longer needed in TensorRT 10.x
build_serialized_network() is now the standard method and always available
Error handling should check for None return value instead of catching AttributeError

Python APIs Added in 10.x#

The following Python APIs have been added in TensorRT 10.x to support new features and improved functionality.

Types#

APILanguage
ExecutionContextAllocationStrategy
IGpuAsyncAllocator
InterfaceInfo
IPluginResource
IPluginV3
IStreamReader
IVersionedInterface

Methods and Properties#

ICudaEngine.is_debug_tensor()
ICudaEngine.minimum_weight_streaming_budget
ICudaEngine.streamable_weights_size
ICudaEngine.weight_streaming_budget
IExecutionContext.get_debug_listener()
IExecutionContext.get_debug_state()
IExecutionContext.set_all_tensors_debug_state()
IExecutionContext.set_debug_listener()
IExecutionContext.set_tensor_debug_state()
IExecutionContext.update_device_memory_size_for_shapes()
IGpuAllocator.allocate_async()
IGpuAllocator.deallocate_async()
INetworkDefinition.add_plugin_v3()
INetworkDefinition.is_debug_tensor()
INetworkDefinition.mark_debug()
INetworkDefinition.unmark_debug()
IPluginRegistry.acquire_plugin_resource()
IPluginRegistry.all_creators
IPluginRegistry.deregister_creator()
IPluginRegistry.get_creator()
IPluginRegistry.register_creator()
IPluginRegistry.release_plugin_resource()

Removed Python APIs and Replacements#

Warning

The APIs listed below have been removed in TensorRT 10.x and will cause runtime errors if called. Review each entry for its replacement before upgrading.

Removed API	Replacement
`BuilderFlag.ENABLE_TACTIC_HEURISTIC`	Builder optimization level 2.
`BuilderFlag.STRICT_TYPES`	Use `BuilderFlag.DIRECT_IO`, `BuilderFlag.PREFER_PRECISION_CONSTRAINTS`, and `BuilderFlag.REJECT_EMPTY_ALGORITHMS`.
`EngineCapability.DEFAULT`	`EngineCapability.STANDARD`
`EngineCapability.kSAFE_DLA`	`EngineCapability.DLA_STANDALONE`
`EngineCapability.SAFE_GPU`	`EngineCapability.SAFETY`
`IAlgorithmIOInfo.tensor_format`	Strides, data type, and vectorization information are sufficient to identify tensor formats uniquely.
`IBuilder.max_batch_size`	Implicit batch support was removed.
`IBuilderConfig.max_workspace_size`	`IBuilderConfig.set_memory_pool_limit()` or `get_memory_pool_limit()` with `MemoryPoolType.WORKSPACE`
`IBuilderConfig.min_timing_iterations`	`IBuilderConfig.avg_timing_iterations`
`ICudaEngine.binding_is_input()`	`ICudaEngine.get_tensor_mode()`
`ICudaEngine.get_binding_bytes_per_component()`	`ICudaEngine.get_tensor_bytes_per_component()`
`ICudaEngine.get_binding_components_per_element()`	`ICudaEngine.get_tensor_components_per_element()`
`ICudaEngine.get_binding_dtype()`	`ICudaEngine.get_tensor_dtype()`
`ICudaEngine.get_binding_format()`	`ICudaEngine.get_tensor_format()`
`ICudaEngine.get_binding_format_desc()`	`ICudaEngine.get_tensor_format_desc()`
`ICudaEngine.get_binding_index()`	No name-based equivalent replacement.
`ICudaEngine.get_binding_name()`	No name-based equivalent replacement.
`ICudaEngine.get_binding_shape()`	`ICudaEngine.get_tensor_shape()`
`ICudaEngine.get_binding_vectorized_dim()`	`ICudaEngine.get_tensor_vectorized_dim()`
`ICudaEngine.get_location()`	`ITensor.location`
`ICudaEngine.get_profile_shape()`	`ICudaEngine.get_tensor_profile_shape()`
`ICudaEngine.get_profile_shape_input()`	`ICudaEngine.get_tensor_profile_values()`
`ICudaEngine.has_implicit_batch_dimension()`	Implicit batch is no longer supported.
`ICudaEngine.is_execution_binding()`	No name-based equivalent replacement.
`ICudaEngine.is_shape_binding()`	`ICudaEngine.is_shape_inference_io()`
`ICudaEngine.max_batch_size()`	Implicit batch is no longer supported.
`ICudaEngine.num_bindings()`	`ICudaEngine.num_io_tensors()`
`IExecutionContext.get_binding_shape()`	`IExecutionContext.get_tensor_shape()`
`IExecutionContext.get_strides()`	`IExecutionContext.get_tensor_strides()`
`IExecutionContext.set_binding_shape()`	`IExecutionContext.set_input_shape()`
`IFullyConnectedLayer`	`IMatrixMultiplyLayer`
`INetworkDefinition.add_convolution()`	`INetworkDefinition.add_convolution_nd()`
`INetworkDefinition.add_deconvolution()`	`INetworkDefinition.add_deconvolution_nd()`
`INetworkDefinition.add_fully_connected()`	`INetworkDefinition.add_matrix_multiply()`
`INetworkDefinition.add_padding()`	`INetworkDefinition.add_padding_nd()`
`INetworkDefinition.add_pooling()`	`INetworkDefinition.add_pooling_nd()`
`INetworkDefinition.add_rnn_v2()`	`INetworkDefinition.add_loop()`
`INetworkDefinition.has_explicit_precision`	Explicit precision support was removed in 10.0.
`INetworkDefinition.has_implicit_batch_dimension`	Implicit batch support was removed.
`IRNNv2Layer`	`ILoop`
`NetworkDefinitionCreationFlag.EXPLICIT_BATCH`	Support was removed in 10.0.
`NetworkDefinitionCreationFlag.EXPLICIT_PRECISION`	Support was removed in 10.0.
`PaddingMode.CAFFE_ROUND_DOWN`	Caffe support was removed.
`PaddingMode.CAFFE_ROUND_UP`	Caffe support was removed.
`PreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805`	External tactics are always disabled for core code.
`PreviewFeature.FASTER_DYNAMIC_SHAPES_0805`	This flag is on by default.
`ProfilingVerbosity.DEFAULT`	`ProfilingVerbosity.LAYER_NAMES_ONLY`
`ProfilingVerbosity.VERBOSE`	`ProfilingVerbosity.DETAILED`
`ResizeMode`	Use `InterpolationMode`. Alias was removed.
`SampleMode.DEFAULT`	`SampleMode.STRICT_BOUNDS`
`SliceMode`	Use `SampleMode`. Alias was removed.

Appendix: Migrating Python Code from TensorRT 8.x to 10.x#

Migrating I/O Buffer Allocation to Named Tensors#

Summary of Changes#

Migrating from enqueueV2 to enqueueV3 (Python)#

Summary of Changes#

Migrating Engine Builds to build_serialized_network#

Summary of Changes#

Python APIs Added in 10.x#

Types#

Methods and Properties#

Removed Python APIs and Replacements#

Migrating from `enqueueV2` to `enqueueV3` (Python)#

Migrating Engine Builds to `build_serialized_network`#