Is this page helpful?

Migrating Python Code from TensorRT 10.x to 11.x#

This page describes how to update Python code when you migrate from TensorRT 10.x to 11.x: paired examples for strong typing, explicit quantization, plugin migration, and updated runtime APIs, followed by lists of Python APIs added and removed in 11.x.

Note

The Python API is not supported on QNX.

Migrating from Weak Typing to Strong Typing#

TensorRT 11.x removes all precision-enabling builder flags such as BuilderFlag.FP16 and BuilderFlag.INT8. Use ModelOpt AutoCast to convert your ONNX model to mixed precision before building.

Before (TensorRT 10.x)#

 import tensorrt as trt

 logger = trt.Logger(trt.Logger.WARNING)
 builder = trt.Builder(logger)
 network = builder.create_network()
 config = builder.create_builder_config()

 # Weak typing: TensorRT automatically considers FP16 kernels
 config.set_flag(trt.BuilderFlag.FP16)

 parser = trt.OnnxParser(network, logger)
 with open("model.onnx", "rb") as f:
     parser.parse(f.read())

 engine_bytes = builder.build_serialized_network(network, config)

In TensorRT 11.x, the BuilderFlag.FP16 flag has been removed along with all other weak typing flags. Use ModelOpt AutoCast to convert your FP32 ONNX model to mixed precision before building, then build with a strongly typed network.

After (TensorRT 11.x)#

 import tensorrt as trt
 import modelopt.onnx.autocast as autocast
 import onnx

 # Step 1: Convert FP32 model to mixed-precision FP16 using ModelOpt AutoCast
 converted_model = autocast.convert_to_mixed_precision(
     onnx_path="model.onnx",
     low_precision_type="fp16",
     keep_io_types=True,
 )
 onnx.save(converted_model, "model_fp16.onnx")

 # Step 2: Build with strongly typed network (strong typing is always on in 11.x)
 logger = trt.Logger(trt.Logger.WARNING)
 builder = trt.Builder(logger)
 network = builder.create_network()
 config = builder.create_builder_config()

 parser = trt.OnnxParser(network, logger)
 with open("model_fp16.onnx", "rb") as f:
     parser.parse(f.read())

 engine_bytes = builder.build_serialized_network(network, config)

Summary of Changes#

Removed config.set_flag(trt.BuilderFlag.FP16) and all other precision-enabling builder flags
Added a preprocessing step using ModelOpt AutoCast to convert the model to mixed precision
The STRONGLY_TYPED network definition creation flag is no longer needed - all networks are strongly typed by default

Migrating INT8 Calibration to Explicit Quantization#

TensorRT 11.x removes IInt8Calibrator and all its subclasses. Use ModelOpt or manual Q/DQ nodes for explicit quantization instead of runtime calibration.

Before (TensorRT 10.x)#

 import tensorrt as trt

 class MyCalibrator(trt.IInt8EntropyCalibrator2):
     def __init__(self, data_loader):
         super().__init__()
         self.data_loader = data_loader
         self.batch_iter = iter(data_loader)

     def get_batch_size(self):
         return 1

     def get_batch(self, names):
         try:
             batch = next(self.batch_iter)
             # Copy batch to device and return device pointers
             return [int(batch.data_ptr())]
         except StopIteration:
             return None

     def read_calibration_cache(self):
         return None

     def write_calibration_cache(self, cache):
         pass

 logger = trt.Logger(trt.Logger.WARNING)
 builder = trt.Builder(logger)
 network = builder.create_network()
 config = builder.create_builder_config()

 # Implicit quantization via calibrator
 config.set_flag(trt.BuilderFlag.INT8)
 config.int8_calibrator = MyCalibrator(data_loader)

 parser = trt.OnnxParser(network, logger)
 with open("model.onnx", "rb") as f:
     parser.parse(f.read())

 engine_bytes = builder.build_serialized_network(network, config)

In TensorRT 11.x, IInt8Calibrator and all its subclasses have been removed. Use ModelOpt or manual Q/DQ nodes for explicit quantization.

After (TensorRT 11.x)#

 import tensorrt as trt

 # Step 1: Quantize the model offline using ModelOpt (recommended)
 # python -m modelopt.onnx.quantization --onnx_path model.onnx --calibration_data data.npz

 # Step 2: Build the quantized model (Q/DQ nodes are already in the ONNX graph)
 logger = trt.Logger(trt.Logger.WARNING)
 builder = trt.Builder(logger)
 network = builder.create_network()
 config = builder.create_builder_config()

 parser = trt.OnnxParser(network, logger)
 with open("model_quantized.onnx", "rb") as f:
     parser.parse(f.read())

 engine_bytes = builder.build_serialized_network(network, config)

Summary of Changes#

Removed the IInt8Calibrator subclass and all calibration-related code
Removed config.set_flag(trt.BuilderFlag.INT8) and config.int8_calibrator assignment
Quantization is now applied to the model itself (via Q/DQ nodes) before building, not during building

Migrating Plugins from V2 to V3#

The entire V2 plugin family — IPluginV2, IPluginV2Ext, IPluginV2IOExt, IPluginV2DynamicExt, IPluginCreator, IPluginV2Layer, and INetworkDefinition.add_plugin_v2() — has been removed in TensorRT 11.0. V2 plugin code will fail to import or build against the TensorRT 11.0 Python bindings; existing V2 plugins must be ported to the IPluginV3 interface with IPluginCreatorV3One and added through INetworkDefinition.add_plugin_v3().

Before (TensorRT 10.x)#

 import tensorrt as trt

 class MyPluginV2(trt.IPluginV2DynamicExt):
     def __init__(self):
         super().__init__()
         self.plugin_namespace = ""
         self.plugin_type = "MyPlugin"
         self.plugin_version = "1"
         self.num_outputs = 1

     def get_output_dimensions(self, output_index, inputs, exprBuilder):
         return inputs[0]  # Same shape as input

     def configure_plugin(self, inp, out):
         pass

     def supports_format_combination(self, pos, in_out, num_inputs):
         return in_out[pos].format == trt.TensorFormat.LINEAR and \
               in_out[pos].type == trt.DataType.FLOAT

     def enqueue(self, input_desc, output_desc, inputs, outputs, workspace, stream):
         # Kernel execution
         pass

     def clone(self):
         return MyPluginV2()

     def serialize(self):
         return b""

 # Creator
 class MyPluginCreatorV2(trt.IPluginCreator):
     def __init__(self):
         super().__init__()
         self.name = "MyPlugin"
         self.plugin_namespace = ""
         self.plugin_version = "1"
         self.field_names = trt.PluginFieldCollection([])

     def create_plugin(self, name, fc):
         return MyPluginV2()

     def deserialize_plugin(self, name, data):
         return MyPluginV2()

 # Usage
 trt.get_plugin_registry().register_creator(MyPluginCreatorV2())
 plugin = MyPluginV2()
 layer = network.add_plugin_v2([input_tensor], plugin)

After (TensorRT 11.x)#

 import tensorrt as trt

 class MyPluginV3(trt.IPluginV3, trt.IPluginV3OneCore,
                 trt.IPluginV3OneBuild, trt.IPluginV3OneRuntime):
     def __init__(self):
         trt.IPluginV3.__init__(self)
         trt.IPluginV3OneCore.__init__(self)
         trt.IPluginV3OneBuild.__init__(self)
         trt.IPluginV3OneRuntime.__init__(self)
         self.num_outputs = 1
         self.plugin_namespace = ""
         self.plugin_name = "MyPlugin"
         self.plugin_version = "1"

     def get_capability_interface(self, type):
         return self

     def get_output_data_types(self, input_types):
         return [input_types[0]]  # Same type as input

     def get_output_shapes(self, inputs, shape_inputs, exprBuilder):
         # Return a list of DimsExprs, one per output
         output = trt.DimsExprs(len(inputs[0]))
         for i in range(len(inputs[0])):
             output[i] = inputs[0][i]
         return [output]

     def supports_format_combination(self, pos, in_out, num_inputs):
         return in_out[pos].format == trt.TensorFormat.LINEAR and \
               in_out[pos].type == trt.DataType.FLOAT

     def configure_plugin(self, inp, out):
         pass

     def on_shape_change(self, inp, out):
         return 0

     def enqueue(self, input_desc, output_desc, inputs, outputs, workspace, stream):
         # Kernel execution
         pass

     def get_fields_to_serialize(self):
         return trt.PluginFieldCollection([])

     def get_valid_tactics(self):
         return []

     def set_tactic(self, tactic):
         return 0

     def clone(self):
         cloned_plugin = MyPluginV3()
         cloned_plugin.__dict__.update(self.__dict__)
         return cloned_plugin

     def attach_to_context(self, context):
         return self.clone()

 # Creator
 class MyPluginCreatorV3(trt.IPluginCreatorV3One):
     def __init__(self):
         trt.IPluginCreatorV3One.__init__(self)
         self.name = "MyPlugin"
         self.plugin_namespace = ""
         self.plugin_version = "1"
         self.field_names = trt.PluginFieldCollection([])

     def create_plugin(self, name, fc, phase):
         return MyPluginV3()

 # Usage
 trt.get_plugin_registry().register_creator(MyPluginCreatorV3())
 plugin = MyPluginV3()
 layer = network.add_plugin_v3([input_tensor], [], plugin)

Summary of Changes#

Plugin class now inherits from IPluginV3, IPluginV3OneCore, IPluginV3OneBuild, and IPluginV3OneRuntime instead of IPluginV2DynamicExt
Added get_capability_interface() to return the appropriate capability for each phase
get_output_dimensions() replaced by get_output_shapes(), which receives both data inputs and shape inputs and returns a list of DimsExprs
get_output_data_types() is a new required method
serialize() replaced by get_fields_to_serialize(), which returns a PluginFieldCollection
clone() is no longer needed
Creator inherits from IPluginCreatorV3One instead of IPluginCreator; create_plugin() receives an additional phase parameter and deserialize_plugin() is no longer needed
network.add_plugin_v2() replaced by network.add_plugin_v3(), which accepts an additional list of shape inputs

Known Issues When Migrating Plugins#

Empty PluginField initializers can crash V3 dispatch. When a plugin advertises a PluginField whose data is empty (b"" or a zero-length array), the V3 creator dispatch path can crash during build or deserialization. Populate every entry with a non-empty sentinel value, even when it is unused at runtime:

import numpy as np
import tensorrt as trt

# Bad: empty initializer
fields = trt.PluginFieldCollection([
    trt.PluginField("flag", b"", trt.PluginFieldType.INT32),
])

# Good: non-empty sentinel keeps the dispatch path safe
fields = trt.PluginFieldCollection([
    trt.PluginField("flag", np.array([0], dtype=np.int32), trt.PluginFieldType.INT32),
])

Use strongly-typed networks with IPluginV3. Mixing IPluginV3 plugins with weakly-typed networks can hit fusion paths that were not exercised by IPluginV2DynamicExt and trigger crashes. In TensorRT 11.0.0 all precision-enabling builder flags (BuilderFlag.FP16, INT8, BF16, FP8, INT4, FP4) have been removed, so any network you build is strongly typed by default; no action required for fresh 11.x builds. Authors back-porting V3 plugins to a 10.x build for evaluation must explicitly opt in with builder.create_network(int(trt.NetworkDefinitionCreationFlag.STRONGLY_TYPED)).

Migrating Weight Streaming APIs#

The weight streaming API has been updated in TensorRT 11.x. The minimum_weight_streaming_budget property has been removed; compute a budget from streamable_weights_size and available device memory instead.

Before (TensorRT 10.x)#

 import tensorrt as trt
 import pycuda.driver as cuda

 engine = runtime.deserialize_cuda_engine(engine_bytes)

 # Old API
 budget = engine.get_minimum_weight_streaming_budget
 engine.weight_streaming_budget = budget
 current = engine.weight_streaming_budget

After (TensorRT 11.x)#

 import tensorrt as trt
 import pycuda.driver as cuda

 engine = runtime.deserialize_cuda_engine(engine_bytes)

 # V2 API
 free, total = cuda.mem_get_info()
 weights_size = engine.streamable_weights_size
 budget = min(free // 2, weights_size // 2)
 engine.weight_streaming_budget_v2 = budget
 current = engine.weight_streaming_budget_v2

Summary of Changes#

weight_streaming_budget replaced by weight_streaming_budget_v2
get_minimum_weight_streaming_budget removed - compute your own budget based on available memory and streamable_weights_size

Migrating Memory Management APIs#

TensorRT 11.x replaces device_memory_size with device_memory_size_v2 (which returns int64) and removes create_execution_context_without_device_memory().

Before (TensorRT 10.x)#

 engine = runtime.deserialize_cuda_engine(engine_bytes)

 # Old API
 mem_size = engine.device_memory_size
 context = engine.create_execution_context_without_device_memory()

After (TensorRT 11.x)#

 engine = runtime.deserialize_cuda_engine(engine_bytes)

 # V2 API (returns int64)
 mem_size = engine.device_memory_size_v2
 context = engine.create_execution_context()

Summary of Changes#

device_memory_size replaced by device_memory_size_v2 (returns int64 instead of size_t)
create_execution_context_without_device_memory() removed - use create_execution_context() with appropriate runtime configuration

Removed Python APIs and Replacements#

Warning

The APIs listed below have been removed in TensorRT 11.x and will cause runtime errors if called. Review each entry for its replacement before upgrading.

The following Python APIs have been removed. Each entry shows the removed API and its replacement or migration path.

BuilderFlag.FP16: Strong typing with ModelOpt AutoCast
BuilderFlag.INT8: Explicit quantization with Q/DQ nodes
BuilderFlag.FP8: Explicit quantization with Q/DQ nodes
BuilderFlag.BF16: Strong typing with ModelOpt AutoCast
BuilderFlag.INT4: Explicit quantization with Q/DQ nodes
BuilderFlag.FP4: Explicit quantization with Q/DQ nodes
BuilderFlag.OBEY_PRECISION_CONSTRAINTS: Strong typing (always enforced)
BuilderFlag.PREFER_PRECISION_CONSTRAINTS: Strong typing (always enforced)
BuilderFlag.DIRECT_IO: Removed (unneeded)
IBuilderConfig.int8_calibrator: Explicit quantization with Q/DQ nodes
IBuilderConfig.set_calibration_profile(): Explicit quantization with Q/DQ nodes
IBuilderConfig.get_calibration_profile(): Explicit quantization with Q/DQ nodes
IBuilderConfig.set_quantization_flag(): Explicit quantization with Q/DQ nodes
IBuilderConfig.get_quantization_flag(): Explicit quantization with Q/DQ nodes
IBuilderConfig.quantization_flags: Explicit quantization with Q/DQ nodes
IBuilderConfig.clear_quantization_flag(): Explicit quantization with Q/DQ nodes
ICudaEngine.device_memory_size: ICudaEngine.device_memory_size_v2
ICudaEngine.device_memory_size_for_profile(): ICudaEngine.device_memory_size_for_profile_v2()
ICudaEngine.has_implicit_batch_dimension: Removed (always False)
ICudaEngine.weight_streaming_budget: ICudaEngine.weight_streaming_budget_v2
ICudaEngine.minimum_weight_streaming_budget: Compute from streamable_weights_size and available memory
ICudaEngine.create_execution_context_without_device_memory(): ICudaEngine.create_execution_context()
ICudaEngine.get_profile_tensor_values(): ICudaEngine.get_profile_tensor_values_v2()
IExecutionContext.all_input_shapes_specified: Removed (always True)
IExecutionContext.device_memory: IExecutionContext.device_memory_v2
IInt8Calibrator (all subclasses): Explicit quantization with Q/DQ nodes
ILayer.precision: Strong typing (set types on tensors directly)
ILayer.precision_is_set: Removed
ILayer.reset_precision(): Removed
ILayer.set_output_type(): Strong typing (set types on tensors directly)
ILayer.output_type_is_set(): Removed
ILayer.reset_output_type(): Removed
INormalizationLayer.compute_precision: Removed (use strong typing)
INetworkDefinition.add_plugin_v2(): INetworkDefinition.add_plugin_v3()
INetworkDefinition.add_normalization(): INetworkDefinition.add_normalization_v2()
IPluginV2DynamicExt: IPluginV3
IPluginCreator: IPluginCreatorV3One
IPluginRegistry.register_creator() (old overload): IPluginRegistry.register_creator() (accepts IPluginCreatorInterface)
IPluginRegistry.plugin_creator_list: IPluginRegistry.all_creators
IPluginRegistry.get_plugin_creator(): IPluginRegistry.get_creator()
IRefitter.set_dynamic_range(): Explicit quantization with Q/DQ nodes
IRefitter.get_dynamic_range_min(): Explicit quantization with Q/DQ nodes
IRefitter.get_dynamic_range_max(): Explicit quantization with Q/DQ nodes
IRefitter.get_tensors_with_dynamic_range(): Explicit quantization with Q/DQ nodes
ITensor.set_type(): Strong typing (type determined by network construction)
ITensor.dynamic_range: Explicit quantization with Q/DQ nodes
ITensor.is_dynamic_range_set: Removed
ITensor.reset_dynamic_range(): Removed
TacticSource.CUBLAS: Removed
TacticSource.CUBLAS_LT: Removed
TacticSource.CUDNN: Removed