Migrating Python Code from TensorRT 10.x to 11.x#
This page describes how to update Python code when you migrate from TensorRT 10.x to 11.x: paired examples for strong typing, explicit quantization, plugin migration, and updated runtime APIs, followed by lists of Python APIs added and removed in 11.x.
Note
The Python API is not supported on QNX.
Migrating from Weak Typing to Strong Typing#
TensorRT 11.x removes all precision-enabling builder flags such as BuilderFlag.FP16 and BuilderFlag.INT8. Use ModelOpt AutoCast to convert your ONNX model to mixed precision before building.
Before (TensorRT 10.x)#
1 import tensorrt as trt
2
3 logger = trt.Logger(trt.Logger.WARNING)
4 builder = trt.Builder(logger)
5 network = builder.create_network()
6 config = builder.create_builder_config()
7
8 # Weak typing: TensorRT automatically considers FP16 kernels
9 config.set_flag(trt.BuilderFlag.FP16)
10
11 parser = trt.OnnxParser(network, logger)
12 with open("model.onnx", "rb") as f:
13 parser.parse(f.read())
14
15 engine_bytes = builder.build_serialized_network(network, config)
In TensorRT 11.x, the BuilderFlag.FP16 flag has been removed along with all other weak typing flags. Use ModelOpt AutoCast to convert your FP32 ONNX model to mixed precision before building, then build with a strongly typed network.
After (TensorRT 11.x)#
1 import tensorrt as trt
2 import modelopt.onnx.autocast as autocast
3 import onnx
4
5 # Step 1: Convert FP32 model to mixed-precision FP16 using ModelOpt AutoCast
6 converted_model = autocast.convert_to_mixed_precision(
7 onnx_path="model.onnx",
8 low_precision_type="fp16",
9 keep_io_types=True,
10 )
11 onnx.save(converted_model, "model_fp16.onnx")
12
13 # Step 2: Build with strongly typed network (strong typing is always on in 11.x)
14 logger = trt.Logger(trt.Logger.WARNING)
15 builder = trt.Builder(logger)
16 network = builder.create_network()
17 config = builder.create_builder_config()
18
19 parser = trt.OnnxParser(network, logger)
20 with open("model_fp16.onnx", "rb") as f:
21 parser.parse(f.read())
22
23 engine_bytes = builder.build_serialized_network(network, config)
Summary of Changes#
Removed
config.set_flag(trt.BuilderFlag.FP16)and all other precision-enabling builder flagsAdded a preprocessing step using ModelOpt AutoCast to convert the model to mixed precision
The
STRONGLY_TYPEDnetwork definition creation flag is no longer needed - all networks are strongly typed by default
Migrating INT8 Calibration to Explicit Quantization#
TensorRT 11.x removes IInt8Calibrator and all its subclasses. Use ModelOpt or manual Q/DQ nodes for explicit quantization instead of runtime calibration.
Before (TensorRT 10.x)#
1 import tensorrt as trt
2
3 class MyCalibrator(trt.IInt8EntropyCalibrator2):
4 def __init__(self, data_loader):
5 super().__init__()
6 self.data_loader = data_loader
7 self.batch_iter = iter(data_loader)
8
9 def get_batch_size(self):
10 return 1
11
12 def get_batch(self, names):
13 try:
14 batch = next(self.batch_iter)
15 # Copy batch to device and return device pointers
16 return [int(batch.data_ptr())]
17 except StopIteration:
18 return None
19
20 def read_calibration_cache(self):
21 return None
22
23 def write_calibration_cache(self, cache):
24 pass
25
26 logger = trt.Logger(trt.Logger.WARNING)
27 builder = trt.Builder(logger)
28 network = builder.create_network()
29 config = builder.create_builder_config()
30
31 # Implicit quantization via calibrator
32 config.set_flag(trt.BuilderFlag.INT8)
33 config.int8_calibrator = MyCalibrator(data_loader)
34
35 parser = trt.OnnxParser(network, logger)
36 with open("model.onnx", "rb") as f:
37 parser.parse(f.read())
38
39 engine_bytes = builder.build_serialized_network(network, config)
In TensorRT 11.x, IInt8Calibrator and all its subclasses have been removed. Use ModelOpt or manual Q/DQ nodes for explicit quantization.
After (TensorRT 11.x)#
1 import tensorrt as trt
2
3 # Step 1: Quantize the model offline using ModelOpt (recommended)
4 # python -m modelopt.onnx.quantization --onnx_path model.onnx --calibration_data data.npz
5
6 # Step 2: Build the quantized model (Q/DQ nodes are already in the ONNX graph)
7 logger = trt.Logger(trt.Logger.WARNING)
8 builder = trt.Builder(logger)
9 network = builder.create_network()
10 config = builder.create_builder_config()
11
12 parser = trt.OnnxParser(network, logger)
13 with open("model_quantized.onnx", "rb") as f:
14 parser.parse(f.read())
15
16 engine_bytes = builder.build_serialized_network(network, config)
Summary of Changes#
Removed the IInt8Calibrator subclass and all calibration-related code
Removed
config.set_flag(trt.BuilderFlag.INT8)andconfig.int8_calibratorassignmentQuantization is now applied to the model itself (via Q/DQ nodes) before building, not during building
Migrating Plugins from V2 to V3#
The entire V2 plugin family — IPluginV2, IPluginV2Ext, IPluginV2IOExt, IPluginV2DynamicExt, IPluginCreator, IPluginV2Layer, and INetworkDefinition.add_plugin_v2() — has been removed in TensorRT 11.0. V2 plugin code will fail to import or build against the TensorRT 11.0 Python bindings; existing V2 plugins must be ported to the IPluginV3 interface with IPluginCreatorV3One and added through INetworkDefinition.add_plugin_v3().
See also
- Side-by-Side V2 ↔ V3 API Mapping
Method-by-method mapping table grouped by lifecycle phase (core, build, runtime, serialization, network attachment).
- Known Migration Issues
Known issues encountered when porting V2 plugins, including the empty
PluginFieldinitializer crash and the strongly-typed network requirement.- Performance: Resolving V2 → V3 Regressions
Checklist for resolving performance regressions after migrating a plugin from
IPluginV2DynamicExttoIPluginV3.
Before (TensorRT 10.x)#
1 import tensorrt as trt
2
3 class MyPluginV2(trt.IPluginV2DynamicExt):
4 def __init__(self):
5 super().__init__()
6 self.plugin_namespace = ""
7 self.plugin_type = "MyPlugin"
8 self.plugin_version = "1"
9 self.num_outputs = 1
10
11 def get_output_dimensions(self, output_index, inputs, exprBuilder):
12 return inputs[0] # Same shape as input
13
14 def configure_plugin(self, inp, out):
15 pass
16
17 def supports_format_combination(self, pos, in_out, num_inputs):
18 return in_out[pos].format == trt.TensorFormat.LINEAR and \
19 in_out[pos].type == trt.DataType.FLOAT
20
21 def enqueue(self, input_desc, output_desc, inputs, outputs, workspace, stream):
22 # Kernel execution
23 pass
24
25 def clone(self):
26 return MyPluginV2()
27
28 def serialize(self):
29 return b""
30
31 # Creator
32 class MyPluginCreatorV2(trt.IPluginCreator):
33 def __init__(self):
34 super().__init__()
35 self.name = "MyPlugin"
36 self.plugin_namespace = ""
37 self.plugin_version = "1"
38 self.field_names = trt.PluginFieldCollection([])
39
40 def create_plugin(self, name, fc):
41 return MyPluginV2()
42
43 def deserialize_plugin(self, name, data):
44 return MyPluginV2()
45
46 # Usage
47 trt.get_plugin_registry().register_creator(MyPluginCreatorV2())
48 plugin = MyPluginV2()
49 layer = network.add_plugin_v2([input_tensor], plugin)
After (TensorRT 11.x)#
1 import tensorrt as trt
2
3 class MyPluginV3(trt.IPluginV3, trt.IPluginV3OneCore,
4 trt.IPluginV3OneBuild, trt.IPluginV3OneRuntime):
5 def __init__(self):
6 trt.IPluginV3.__init__(self)
7 trt.IPluginV3OneCore.__init__(self)
8 trt.IPluginV3OneBuild.__init__(self)
9 trt.IPluginV3OneRuntime.__init__(self)
10 self.num_outputs = 1
11 self.plugin_namespace = ""
12 self.plugin_name = "MyPlugin"
13 self.plugin_version = "1"
14
15 def get_capability_interface(self, type):
16 return self
17
18 def get_output_data_types(self, input_types):
19 return [input_types[0]] # Same type as input
20
21 def get_output_shapes(self, inputs, shape_inputs, exprBuilder):
22 # Return a list of DimsExprs, one per output
23 output = trt.DimsExprs(len(inputs[0]))
24 for i in range(len(inputs[0])):
25 output[i] = inputs[0][i]
26 return [output]
27
28 def supports_format_combination(self, pos, in_out, num_inputs):
29 return in_out[pos].format == trt.TensorFormat.LINEAR and \
30 in_out[pos].type == trt.DataType.FLOAT
31
32 def configure_plugin(self, inp, out):
33 pass
34
35 def on_shape_change(self, inp, out):
36 return 0
37
38 def enqueue(self, input_desc, output_desc, inputs, outputs, workspace, stream):
39 # Kernel execution
40 pass
41
42 def get_fields_to_serialize(self):
43 return trt.PluginFieldCollection([])
44
45 def get_valid_tactics(self):
46 return []
47
48 def set_tactic(self, tactic):
49 return 0
50
51 def clone(self):
52 cloned_plugin = MyPluginV3()
53 cloned_plugin.__dict__.update(self.__dict__)
54 return cloned_plugin
55
56 def attach_to_context(self, context):
57 return self.clone()
58
59 # Creator
60 class MyPluginCreatorV3(trt.IPluginCreatorV3One):
61 def __init__(self):
62 trt.IPluginCreatorV3One.__init__(self)
63 self.name = "MyPlugin"
64 self.plugin_namespace = ""
65 self.plugin_version = "1"
66 self.field_names = trt.PluginFieldCollection([])
67
68 def create_plugin(self, name, fc, phase):
69 return MyPluginV3()
70
71 # Usage
72 trt.get_plugin_registry().register_creator(MyPluginCreatorV3())
73 plugin = MyPluginV3()
74 layer = network.add_plugin_v3([input_tensor], [], plugin)
Summary of Changes#
Plugin class now inherits from
IPluginV3,IPluginV3OneCore,IPluginV3OneBuild, andIPluginV3OneRuntimeinstead ofIPluginV2DynamicExtAdded
get_capability_interface()to return the appropriate capability for each phaseget_output_dimensions()replaced byget_output_shapes(), which receives both data inputs and shape inputs and returns a list ofDimsExprsget_output_data_types()is a new required methodserialize()replaced byget_fields_to_serialize(), which returns aPluginFieldCollectionclone()is no longer neededCreator inherits from
IPluginCreatorV3Oneinstead ofIPluginCreator;create_plugin()receives an additionalphaseparameter anddeserialize_plugin()is no longer needednetwork.add_plugin_v2()replaced bynetwork.add_plugin_v3(), which accepts an additional list of shape inputs
Known Issues When Migrating Plugins#
Empty PluginField initializers can crash V3 dispatch. When a plugin advertises a
PluginFieldwhosedatais empty (b""or a zero-length array), the V3 creator dispatch path can crash during build or deserialization. Populate every entry with a non-empty sentinel value, even when it is unused at runtime:import numpy as np import tensorrt as trt # Bad: empty initializer fields = trt.PluginFieldCollection([ trt.PluginField("flag", b"", trt.PluginFieldType.INT32), ]) # Good: non-empty sentinel keeps the dispatch path safe fields = trt.PluginFieldCollection([ trt.PluginField("flag", np.array([0], dtype=np.int32), trt.PluginFieldType.INT32), ])
Use strongly-typed networks with IPluginV3. Mixing
IPluginV3plugins with weakly-typed networks can hit fusion paths that were not exercised byIPluginV2DynamicExtand trigger crashes. In TensorRT 11.0.0 all precision-enabling builder flags (BuilderFlag.FP16,INT8,BF16,FP8,INT4,FP4) have been removed, so any network you build is strongly typed by default; no action required for fresh 11.x builds. Authors back-porting V3 plugins to a 10.x build for evaluation must explicitly opt in withbuilder.create_network(int(trt.NetworkDefinitionCreationFlag.STRONGLY_TYPED)).
Migrating Weight Streaming APIs#
The weight streaming API has been updated in TensorRT 11.x. The minimum_weight_streaming_budget property has been removed; compute a budget from streamable_weights_size and available device memory instead.
Before (TensorRT 10.x)#
1 import tensorrt as trt
2 import pycuda.driver as cuda
3
4 engine = runtime.deserialize_cuda_engine(engine_bytes)
5
6 # Old API
7 budget = engine.get_minimum_weight_streaming_budget
8 engine.weight_streaming_budget = budget
9 current = engine.weight_streaming_budget
After (TensorRT 11.x)#
1 import tensorrt as trt
2 import pycuda.driver as cuda
3
4 engine = runtime.deserialize_cuda_engine(engine_bytes)
5
6 # V2 API
7 free, total = cuda.mem_get_info()
8 weights_size = engine.streamable_weights_size
9 budget = min(free // 2, weights_size // 2)
10 engine.weight_streaming_budget_v2 = budget
11 current = engine.weight_streaming_budget_v2
Summary of Changes#
weight_streaming_budgetreplaced byweight_streaming_budget_v2get_minimum_weight_streaming_budgetremoved - compute your own budget based on available memory andstreamable_weights_size
Migrating Memory Management APIs#
TensorRT 11.x replaces device_memory_size with device_memory_size_v2 (which returns int64) and removes create_execution_context_without_device_memory().
Before (TensorRT 10.x)#
1 engine = runtime.deserialize_cuda_engine(engine_bytes)
2
3 # Old API
4 mem_size = engine.device_memory_size
5 context = engine.create_execution_context_without_device_memory()
After (TensorRT 11.x)#
1 engine = runtime.deserialize_cuda_engine(engine_bytes)
2
3 # V2 API (returns int64)
4 mem_size = engine.device_memory_size_v2
5 context = engine.create_execution_context()
Summary of Changes#
device_memory_sizereplaced bydevice_memory_size_v2(returnsint64instead ofsize_t)create_execution_context_without_device_memory()removed - usecreate_execution_context()with appropriate runtime configuration
Removed Python APIs and Replacements#
Warning
The APIs listed below have been removed in TensorRT 11.x and will cause runtime errors if called. Review each entry for its replacement before upgrading.
The following Python APIs have been removed. Each entry shows the removed API and its replacement or migration path.
BuilderFlag.FP16Strong typing with ModelOpt AutoCast
BuilderFlag.INT8Explicit quantization with Q/DQ nodes
BuilderFlag.FP8Explicit quantization with Q/DQ nodes
BuilderFlag.BF16Strong typing with ModelOpt AutoCast
BuilderFlag.INT4Explicit quantization with Q/DQ nodes
BuilderFlag.FP4Explicit quantization with Q/DQ nodes
BuilderFlag.OBEY_PRECISION_CONSTRAINTSStrong typing (always enforced)
BuilderFlag.PREFER_PRECISION_CONSTRAINTSStrong typing (always enforced)
BuilderFlag.DIRECT_IORemoved (unneeded)
IBuilderConfig.int8_calibratorExplicit quantization with Q/DQ nodes
IBuilderConfig.set_calibration_profile()Explicit quantization with Q/DQ nodes
IBuilderConfig.get_calibration_profile()Explicit quantization with Q/DQ nodes
IBuilderConfig.set_quantization_flag()Explicit quantization with Q/DQ nodes
IBuilderConfig.get_quantization_flag()Explicit quantization with Q/DQ nodes
IBuilderConfig.quantization_flagsExplicit quantization with Q/DQ nodes
IBuilderConfig.clear_quantization_flag()Explicit quantization with Q/DQ nodes
ICudaEngine.device_memory_sizeICudaEngine.device_memory_size_v2ICudaEngine.device_memory_size_for_profile()ICudaEngine.device_memory_size_for_profile_v2()ICudaEngine.has_implicit_batch_dimensionRemoved (always
False)ICudaEngine.weight_streaming_budgetICudaEngine.weight_streaming_budget_v2ICudaEngine.minimum_weight_streaming_budgetCompute from
streamable_weights_sizeand available memoryICudaEngine.create_execution_context_without_device_memory()ICudaEngine.create_execution_context()ICudaEngine.get_profile_tensor_values()ICudaEngine.get_profile_tensor_values_v2()IExecutionContext.all_input_shapes_specifiedRemoved (always
True)IExecutionContext.device_memoryIExecutionContext.device_memory_v2IInt8Calibrator(all subclasses)Explicit quantization with Q/DQ nodes
ILayer.precisionStrong typing (set types on tensors directly)
ILayer.precision_is_setRemoved
ILayer.reset_precision()Removed
ILayer.set_output_type()Strong typing (set types on tensors directly)
ILayer.output_type_is_set()Removed
ILayer.reset_output_type()Removed
INormalizationLayer.compute_precisionRemoved (use strong typing)
INetworkDefinition.add_plugin_v2()INetworkDefinition.add_plugin_v3()INetworkDefinition.add_normalization()INetworkDefinition.add_normalization_v2()IPluginV2DynamicExtIPluginV3IPluginCreatorIPluginCreatorV3OneIPluginRegistry.register_creator()(old overload)IPluginRegistry.register_creator()(acceptsIPluginCreatorInterface)IPluginRegistry.plugin_creator_listIPluginRegistry.all_creatorsIPluginRegistry.get_plugin_creator()IPluginRegistry.get_creator()IRefitter.set_dynamic_range()Explicit quantization with Q/DQ nodes
IRefitter.get_dynamic_range_min()Explicit quantization with Q/DQ nodes
IRefitter.get_dynamic_range_max()Explicit quantization with Q/DQ nodes
IRefitter.get_tensors_with_dynamic_range()Explicit quantization with Q/DQ nodes
ITensor.set_type()Strong typing (type determined by network construction)
ITensor.dynamic_rangeExplicit quantization with Q/DQ nodes
ITensor.is_dynamic_range_setRemoved
ITensor.reset_dynamic_range()Removed
TacticSource.CUBLASRemoved
TacticSource.CUBLAS_LTRemoved
TacticSource.CUDNNRemoved