Is this page helpful?

Migrating Python Code from TensorRT 8.x to 10.x#

This page describes how to update Python code when you migrate from TensorRT 8.x to 10.x. Subsections below pair 8.x and 10.x code for typical tasks—the name-based tensor API (buffers and I/O), enqueueV3 execution, and build_serialized_network for engine build—and later sections list Python APIs added and removed in 10.x.

Migrating I/O Buffer Allocation to Named Tensors#

Before (TensorRT 8.x)#

def allocate_buffers(self, engine):
'''
Allocates all buffers required for an engine; that is, host and device inputs and outputs.
'''
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()

# binding is the name of input/output
for binding in the engine:
    size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
    dtype = trt.nptype(engine.get_binding_dtype(binding))

    # Allocate host and device buffers
    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (will not be swapped to disk)
    device_mem = cuda.mem_alloc(host_mem.nbytes)

    # Append the device buffer address to device bindings.
    # When cast to int, it is a linear index into the context's memory (like memory address).
    bindings.append(int(device_mem))

    # Append to the appropriate input/output list.
    if engine.binding_is_input(binding):
        inputs.append(self.HostDeviceMem(host_mem, device_mem))
    else:
        outputs.append(self.HostDeviceMem(host_mem, device_mem))

return inputs, outputs, bindings, stream

After (TensorRT 10.x)#

def allocate_buffers(self, engine):
'''
Allocates all buffers required for an engine; that is, host and device inputs and outputs.
'''
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()

for i in range(engine.num_io_tensors):
    tensor_name = engine.get_tensor_name(i)
    size = trt.volume(engine.get_tensor_shape(tensor_name))
    dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))

    # Allocate host and device buffers
    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (will not be swapped to disk)
    device_mem = cuda.mem_alloc(host_mem.nbytes)

    # Append the device buffer address to device bindings.
    # When cast to int, it is a linear index into the context's memory (like memory address).
    bindings.append(int(device_mem))

    # Append to the appropriate input/output list.
    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
        inputs.append(self.HostDeviceMem(host_mem, device_mem))
    else:
        outputs.append(self.HostDeviceMem(host_mem, device_mem))

return inputs, outputs, bindings, stream

Summary of Changes#

Changed from binding-based iteration to name-based API using num_io_tensors and get_tensor_name()
Replaced binding_is_input() with get_tensor_mode() to check I/O mode

Note

The HostDeviceMem helper class used in the examples above is a simple container that pairs host (CPU) and device (GPU) memory allocations. It is not part of the TensorRT API. A minimal implementation is:

from collections import namedtuple

HostDeviceMem = namedtuple("HostDeviceMem", ["host", "device"])

Migrating from `enqueueV2` to `enqueueV3` (Python)#

The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same inference task.

Before (TensorRT 8.x)#

# Allocate device memory for inputs.
d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]

# Allocate device memory for outputs.
h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
d_output = cuda.mem_alloc(h_output.nbytes)

# Transfer data from host to device.
cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
cuda.memcpy_htod_async(d_inputs[2], input_c, stream)

# Run inference
context.execute_async_v2(bindings=[int(d_inp) for d_inp in d_inputs] + [int(d_output)], stream_handle=stream.handle)

# Synchronize the stream
stream.synchronize()

Warning

In TensorRT 10.x, enqueueV3 replaces enqueueV2. You must call set_tensor_address for each I/O tensor before execute_async_v3. Passing bindings directly to the execution call is no longer supported.

The After sample shows this pattern.

After (TensorRT 10.x)#

# Allocate device memory for inputs.
d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]

# Allocate device memory for outputs.
h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
d_output = cuda.mem_alloc(h_output.nbytes)

# Transfer data from host to device.
cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
cuda.memcpy_htod_async(d_inputs[2], input_c, stream)

# Setup tensor address
bindings = [int(d_inputs[i]) for i in range(3)] + [int(d_output)]

for i in range(engine.num_io_tensors):
    context.set_tensor_address(engine.get_tensor_name(i), bindings[i])

# Run inference
context.execute_async_v3(stream_handle=stream.handle)

# Synchronize the stream
stream.synchronize()

Summary of Changes#

Added explicit tensor address setup using set_tensor_address() with tensor names
Changed from execute_async_v2() to execute_async_v3()
The bindings parameter is no longer passed to execute_async_v3(); tensor addresses must be set beforehand

Migrating Engine Builds to `build_serialized_network`#

The examples below show TensorRT 8.x first, then TensorRT 10.x, for the same engine build path.

Before (TensorRT 8.x)#

engine_bytes = None
try:
    engine_bytes = self.builder.build_serialized_network(self.network, self.config)
except AttributeError:
    engine = self.builder.build_engine(self.network, self.config)
    engine_bytes = engine.serialize()
    del engine
assert engine_bytes

Important

In TensorRT 10.x, build_serialized_network() is the standard build path and is always available. The build_engine() / serialize() fallback is no longer needed. Check for a None return instead of catching AttributeError.

The After sample shows this pattern.

After (TensorRT 10.x)#

engine_bytes = self.builder.build_serialized_network(self.network, self.config)
if engine_bytes is None:
    log.error("Failed to create engine")
    sys.exit(1)

Summary of Changes#

The fallback to build_engine() and serialize() is no longer needed in TensorRT 10.x
build_serialized_network() is now the standard method and always available
Error handling should check for None return value instead of catching AttributeError

Python APIs Added in 10.x#

The following Python APIs have been added in TensorRT 10.x to support new features and improved functionality.

Types#

APILanguage
ExecutionContextAllocationStrategy
IGpuAsyncAllocator
InterfaceInfo
IPluginResource
IPluginV3
IStreamReader
IVersionedInterface

Methods and Properties#

ICudaEngine.is_debug_tensor()
ICudaEngine.minimum_weight_streaming_budget
ICudaEngine.streamable_weights_size
ICudaEngine.weight_streaming_budget
IExecutionContext.get_debug_listener()
IExecutionContext.get_debug_state()
IExecutionContext.set_all_tensors_debug_state()
IExecutionContext.set_debug_listener()
IExecutionContext.set_tensor_debug_state()
IExecutionContext.update_device_memory_size_for_shapes()
IGpuAllocator.allocate_async()
IGpuAllocator.deallocate_async()
INetworkDefinition.add_plugin_v3()
INetworkDefinition.is_debug_tensor()
INetworkDefinition.mark_debug()
INetworkDefinition.unmark_debug()
IPluginRegistry.acquire_plugin_resource()
IPluginRegistry.all_creators
IPluginRegistry.deregister_creator()
IPluginRegistry.get_creator()
IPluginRegistry.register_creator()
IPluginRegistry.release_plugin_resource()

Removed Python APIs and Replacements#

Warning

The APIs listed below have been removed in TensorRT 10.x and will cause runtime errors if called. Review each entry for its replacement before upgrading.

The following Python APIs have been removed. Each entry shows the removed API and its replacement or migration path.

BuilderFlag.ENABLE_TACTIC_HEURISTIC: Builder optimization level 2
BuilderFlag.STRICT_TYPES: Use: BuilderFlag.DIRECT_IO, BuilderFlag.PREFER_PRECISION_CONSTRAINTS, and BuilderFlag.REJECT_EMPTY_ALGORITHMS
EngineCapability.DEFAULT: EngineCapability.STANDARD
EngineCapability.kSAFE_DLA: EngineCapability.DLA_STANDALONE
EngineCapability.SAFE_GPU: EngineCapability.SAFETY
IAlgorithmIOInfo.tensor_format: Strides, data type, and vectorization information are sufficient to identify tensor formats uniquely.
IBuilder.max_batch_size: Implicit batch support was removed
IBuilderConfig.max_workspace_size: IBuilderConfig.set_memory_pool_limit() or get_memory_pool_limit() with MemoryPoolType.WORKSPACE
IBuilderConfig.min_timing_iterations: IBuilderConfig.avg_timing_iterations
ICudaEngine.binding_is_input(): ICudaEngine.get_tensor_mode()
ICudaEngine.get_binding_bytes_per_component(): ICudaEngine.get_tensor_bytes_per_component()
ICudaEngine.get_binding_components_per_element(): ICudaEngine.get_tensor_components_per_element()
ICudaEngine.get_binding_dtype(): ICudaEngine.get_tensor_dtype()
ICudaEngine.get_binding_format(): ICudaEngine.get_tensor_format()
ICudaEngine.get_binding_format_desc(): ICudaEngine.get_tensor_format_desc()
ICudaEngine.get_binding_index(): No name-based equivalent replacement
ICudaEngine.get_binding_name(): No name-based equivalent replacement
ICudaEngine.get_binding_shape(): ICudaEngine.get_tensor_shape()
ICudaEngine.get_binding_vectorized_dim(): ICudaEngine.get_tensor_vectorized_dim()
ICudaEngine.get_location(): ITensor.location
ICudaEngine.get_profile_shape(): ICudaEngine.get_tensor_profile_shape()
ICudaEngine.get_profile_shape_input(): ICudaEngine.get_tensor_profile_values()
ICudaEngine.has_implicit_batch_dimension(): Implicit batch is no longer supported
ICudaEngine.is_execution_binding(): No name-based equivalent replacement
ICudaEngine.is_shape_binding(): ICudaEngine.is_shape_inference_io()
ICudaEngine.max_batch_size(): Implicit batch is no longer supported
ICudaEngine.num_bindings(): ICudaEngine.num_io_tensors()
IExecutionContext.get_binding_shape(): IExecutionContext.get_tensor_shape()
IExecutionContext.get_strides(): IExecutionContext.get_tensor_strides()
IExecutionContext.set_binding_shape(): IExecutionContext.set_input_shape()
IFullyConnectedLayer: IMatrixMultiplyLayer
INetworkDefinition.add_convolution(): INetworkDefinition.add_convolution_nd()
INetworkDefinition.add_deconvolution(): INetworkDefinition.add_deconvolution_nd()
INetworkDefinition.add_fully_connected(): INetworkDefinition.add_matrix_multiply()
INetworkDefinition.add_padding(): INetworkDefinition.add_padding_nd()
INetworkDefinition.add_pooling(): INetworkDefinition.add_pooling_nd()
INetworkDefinition.add_rnn_v2(): INetworkDefinition.add_loop()
INetworkDefinition.has_explicit_precision: Explicit precision support was removed in 10.0
INetworkDefinition.has_implicit_batch_dimension: Implicit batch support was removed
IRNNv2Layer: ILoop
NetworkDefinitionCreationFlag.EXPLICIT_BATCH: Support was removed in 10.0
NetworkDefinitionCreationFlag.EXPLICIT_PRECISION: Support was removed in 10.0
PaddingMode.CAFFE_ROUND_DOWN: Caffe support was removed
PaddingMode.CAFFE_ROUND_UP: Caffe support was removed
PreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805: External tactics are always disabled for core code
PreviewFeature.FASTER_DYNAMIC_SHAPES_0805: This flag is on by default
ProfilingVerbosity.DEFAULT: ProfilingVerbosity.LAYER_NAMES_ONLY
ProfilingVerbosity.VERBOSE: ProfilingVerbosity.DETAILED
ResizeMode: Use InterpolationMode. Alias was removed.
SampleMode.DEFAULT: SampleMode.STRICT_BOUNDS
SliceMode: Use SampleMode. Alias was removed.

Migrating Python Code from TensorRT 8.x to 10.x#

Migrating I/O Buffer Allocation to Named Tensors#

Before (TensorRT 8.x)#

After (TensorRT 10.x)#

Summary of Changes#

Migrating from enqueueV2 to enqueueV3 (Python)#

Before (TensorRT 8.x)#

After (TensorRT 10.x)#

Summary of Changes#

Migrating Engine Builds to build_serialized_network#

Before (TensorRT 8.x)#

After (TensorRT 10.x)#

Summary of Changes#

Python APIs Added in 10.x#

Types#

Methods and Properties#

Removed Python APIs and Replacements#

Migrating from `enqueueV2` to `enqueueV3` (Python)#

Migrating Engine Builds to `build_serialized_network`#