Migrating Python Code from TensorRT 10.x to 11.x#

This page describes how to update Python code when you migrate from TensorRT 10.x to 11.x: paired examples for strong typing, explicit quantization, plugin migration, and updated runtime APIs, followed by lists of Python APIs added and removed in 11.x.

Note

The Python API is not supported on QNX.

Migrating from Weak Typing to Strong Typing#

TensorRT 11.x removes all precision-enabling builder flags such as BuilderFlag.FP16 and BuilderFlag.INT8. Use ModelOpt AutoCast to convert your ONNX model to mixed precision before building.

Before (TensorRT 10.x)#

 1 import tensorrt as trt
 2
 3 logger = trt.Logger(trt.Logger.WARNING)
 4 builder = trt.Builder(logger)
 5 network = builder.create_network()
 6 config = builder.create_builder_config()
 7
 8 # Weak typing: TensorRT automatically considers FP16 kernels
 9 config.set_flag(trt.BuilderFlag.FP16)
10
11 parser = trt.OnnxParser(network, logger)
12 with open("model.onnx", "rb") as f:
13     parser.parse(f.read())
14
15 engine_bytes = builder.build_serialized_network(network, config)

In TensorRT 11.x, the BuilderFlag.FP16 flag has been removed along with all other weak typing flags. Use ModelOpt AutoCast to convert your FP32 ONNX model to mixed precision before building, then build with a strongly typed network.

After (TensorRT 11.x)#

 1 import tensorrt as trt
 2 import modelopt.onnx.autocast as autocast
 3 import onnx
 4
 5 # Step 1: Convert FP32 model to mixed-precision FP16 using ModelOpt AutoCast
 6 converted_model = autocast.convert_to_mixed_precision(
 7     onnx_path="model.onnx",
 8     low_precision_type="fp16",
 9     keep_io_types=True,
10 )
11 onnx.save(converted_model, "model_fp16.onnx")
12
13 # Step 2: Build with strongly typed network (strong typing is always on in 11.x)
14 logger = trt.Logger(trt.Logger.WARNING)
15 builder = trt.Builder(logger)
16 network = builder.create_network()
17 config = builder.create_builder_config()
18
19 parser = trt.OnnxParser(network, logger)
20 with open("model_fp16.onnx", "rb") as f:
21     parser.parse(f.read())
22
23 engine_bytes = builder.build_serialized_network(network, config)

Summary of Changes#

  • Removed config.set_flag(trt.BuilderFlag.FP16) and all other precision-enabling builder flags

  • Added a preprocessing step using ModelOpt AutoCast to convert the model to mixed precision

  • The STRONGLY_TYPED network definition creation flag is no longer needed - all networks are strongly typed by default

Migrating INT8 Calibration to Explicit Quantization#

TensorRT 11.x removes IInt8Calibrator and all its subclasses. Use ModelOpt or manual Q/DQ nodes for explicit quantization instead of runtime calibration.

Before (TensorRT 10.x)#

 1 import tensorrt as trt
 2
 3 class MyCalibrator(trt.IInt8EntropyCalibrator2):
 4     def __init__(self, data_loader):
 5         super().__init__()
 6         self.data_loader = data_loader
 7         self.batch_iter = iter(data_loader)
 8
 9     def get_batch_size(self):
10         return 1
11
12     def get_batch(self, names):
13         try:
14             batch = next(self.batch_iter)
15             # Copy batch to device and return device pointers
16             return [int(batch.data_ptr())]
17         except StopIteration:
18             return None
19
20     def read_calibration_cache(self):
21         return None
22
23     def write_calibration_cache(self, cache):
24         pass
25
26 logger = trt.Logger(trt.Logger.WARNING)
27 builder = trt.Builder(logger)
28 network = builder.create_network()
29 config = builder.create_builder_config()
30
31 # Implicit quantization via calibrator
32 config.set_flag(trt.BuilderFlag.INT8)
33 config.int8_calibrator = MyCalibrator(data_loader)
34
35 parser = trt.OnnxParser(network, logger)
36 with open("model.onnx", "rb") as f:
37     parser.parse(f.read())
38
39 engine_bytes = builder.build_serialized_network(network, config)

In TensorRT 11.x, IInt8Calibrator and all its subclasses have been removed. Use ModelOpt or manual Q/DQ nodes for explicit quantization.

After (TensorRT 11.x)#

 1 import tensorrt as trt
 2
 3 # Step 1: Quantize the model offline using ModelOpt (recommended)
 4 # python -m modelopt.onnx.quantization --onnx_path model.onnx --calibration_data data.npz
 5
 6 # Step 2: Build the quantized model (Q/DQ nodes are already in the ONNX graph)
 7 logger = trt.Logger(trt.Logger.WARNING)
 8 builder = trt.Builder(logger)
 9 network = builder.create_network()
10 config = builder.create_builder_config()
11
12 parser = trt.OnnxParser(network, logger)
13 with open("model_quantized.onnx", "rb") as f:
14     parser.parse(f.read())
15
16 engine_bytes = builder.build_serialized_network(network, config)

Summary of Changes#

  • Removed the IInt8Calibrator subclass and all calibration-related code

  • Removed config.set_flag(trt.BuilderFlag.INT8) and config.int8_calibrator assignment

  • Quantization is now applied to the model itself (via Q/DQ nodes) before building, not during building

Migrating Plugins from V2 to V3#

The entire V2 plugin family — IPluginV2, IPluginV2Ext, IPluginV2IOExt, IPluginV2DynamicExt, IPluginCreator, IPluginV2Layer, and INetworkDefinition.add_plugin_v2() — has been removed in TensorRT 11.0. V2 plugin code will fail to import or build against the TensorRT 11.0 Python bindings; existing V2 plugins must be ported to the IPluginV3 interface with IPluginCreatorV3One and added through INetworkDefinition.add_plugin_v3().

See also

Side-by-Side V2 ↔ V3 API Mapping

Method-by-method mapping table grouped by lifecycle phase (core, build, runtime, serialization, network attachment).

Known Migration Issues

Known issues encountered when porting V2 plugins, including the empty PluginField initializer crash and the strongly-typed network requirement.

Performance: Resolving V2 → V3 Regressions

Checklist for resolving performance regressions after migrating a plugin from IPluginV2DynamicExt to IPluginV3.

Before (TensorRT 10.x)#

 1 import tensorrt as trt
 2
 3 class MyPluginV2(trt.IPluginV2DynamicExt):
 4     def __init__(self):
 5         super().__init__()
 6         self.plugin_namespace = ""
 7         self.plugin_type = "MyPlugin"
 8         self.plugin_version = "1"
 9         self.num_outputs = 1
10
11     def get_output_dimensions(self, output_index, inputs, exprBuilder):
12         return inputs[0]  # Same shape as input
13
14     def configure_plugin(self, inp, out):
15         pass
16
17     def supports_format_combination(self, pos, in_out, num_inputs):
18         return in_out[pos].format == trt.TensorFormat.LINEAR and \
19               in_out[pos].type == trt.DataType.FLOAT
20
21     def enqueue(self, input_desc, output_desc, inputs, outputs, workspace, stream):
22         # Kernel execution
23         pass
24
25     def clone(self):
26         return MyPluginV2()
27
28     def serialize(self):
29         return b""
30
31 # Creator
32 class MyPluginCreatorV2(trt.IPluginCreator):
33     def __init__(self):
34         super().__init__()
35         self.name = "MyPlugin"
36         self.plugin_namespace = ""
37         self.plugin_version = "1"
38         self.field_names = trt.PluginFieldCollection([])
39
40     def create_plugin(self, name, fc):
41         return MyPluginV2()
42
43     def deserialize_plugin(self, name, data):
44         return MyPluginV2()
45
46 # Usage
47 trt.get_plugin_registry().register_creator(MyPluginCreatorV2())
48 plugin = MyPluginV2()
49 layer = network.add_plugin_v2([input_tensor], plugin)

After (TensorRT 11.x)#

 1 import tensorrt as trt
 2
 3 class MyPluginV3(trt.IPluginV3, trt.IPluginV3OneCore,
 4                 trt.IPluginV3OneBuild, trt.IPluginV3OneRuntime):
 5     def __init__(self):
 6         trt.IPluginV3.__init__(self)
 7         trt.IPluginV3OneCore.__init__(self)
 8         trt.IPluginV3OneBuild.__init__(self)
 9         trt.IPluginV3OneRuntime.__init__(self)
10         self.num_outputs = 1
11         self.plugin_namespace = ""
12         self.plugin_name = "MyPlugin"
13         self.plugin_version = "1"
14
15     def get_capability_interface(self, type):
16         return self
17
18     def get_output_data_types(self, input_types):
19         return [input_types[0]]  # Same type as input
20
21     def get_output_shapes(self, inputs, shape_inputs, exprBuilder):
22         # Return a list of DimsExprs, one per output
23         output = trt.DimsExprs(len(inputs[0]))
24         for i in range(len(inputs[0])):
25             output[i] = inputs[0][i]
26         return [output]
27
28     def supports_format_combination(self, pos, in_out, num_inputs):
29         return in_out[pos].format == trt.TensorFormat.LINEAR and \
30               in_out[pos].type == trt.DataType.FLOAT
31
32     def configure_plugin(self, inp, out):
33         pass
34
35     def on_shape_change(self, inp, out):
36         return 0
37
38     def enqueue(self, input_desc, output_desc, inputs, outputs, workspace, stream):
39         # Kernel execution
40         pass
41
42     def get_fields_to_serialize(self):
43         return trt.PluginFieldCollection([])
44
45     def get_valid_tactics(self):
46         return []
47
48     def set_tactic(self, tactic):
49         return 0
50
51     def clone(self):
52         cloned_plugin = MyPluginV3()
53         cloned_plugin.__dict__.update(self.__dict__)
54         return cloned_plugin
55
56     def attach_to_context(self, context):
57         return self.clone()
58
59 # Creator
60 class MyPluginCreatorV3(trt.IPluginCreatorV3One):
61     def __init__(self):
62         trt.IPluginCreatorV3One.__init__(self)
63         self.name = "MyPlugin"
64         self.plugin_namespace = ""
65         self.plugin_version = "1"
66         self.field_names = trt.PluginFieldCollection([])
67
68     def create_plugin(self, name, fc, phase):
69         return MyPluginV3()
70
71 # Usage
72 trt.get_plugin_registry().register_creator(MyPluginCreatorV3())
73 plugin = MyPluginV3()
74 layer = network.add_plugin_v3([input_tensor], [], plugin)

Summary of Changes#

  • Plugin class now inherits from IPluginV3, IPluginV3OneCore, IPluginV3OneBuild, and IPluginV3OneRuntime instead of IPluginV2DynamicExt

  • Added get_capability_interface() to return the appropriate capability for each phase

  • get_output_dimensions() replaced by get_output_shapes(), which receives both data inputs and shape inputs and returns a list of DimsExprs

  • get_output_data_types() is a new required method

  • serialize() replaced by get_fields_to_serialize(), which returns a PluginFieldCollection

  • clone() is no longer needed

  • Creator inherits from IPluginCreatorV3One instead of IPluginCreator; create_plugin() receives an additional phase parameter and deserialize_plugin() is no longer needed

  • network.add_plugin_v2() replaced by network.add_plugin_v3(), which accepts an additional list of shape inputs

Known Issues When Migrating Plugins#

  • Empty PluginField initializers can crash V3 dispatch. When a plugin advertises a PluginField whose data is empty (b"" or a zero-length array), the V3 creator dispatch path can crash during build or deserialization. Populate every entry with a non-empty sentinel value, even when it is unused at runtime:

    import numpy as np
    import tensorrt as trt
    
    # Bad: empty initializer
    fields = trt.PluginFieldCollection([
        trt.PluginField("flag", b"", trt.PluginFieldType.INT32),
    ])
    
    # Good: non-empty sentinel keeps the dispatch path safe
    fields = trt.PluginFieldCollection([
        trt.PluginField("flag", np.array([0], dtype=np.int32), trt.PluginFieldType.INT32),
    ])
    
  • Use strongly-typed networks with IPluginV3. Mixing IPluginV3 plugins with weakly-typed networks can hit fusion paths that were not exercised by IPluginV2DynamicExt and trigger crashes. In TensorRT 11.0.0 all precision-enabling builder flags (BuilderFlag.FP16, INT8, BF16, FP8, INT4, FP4) have been removed, so any network you build is strongly typed by default; no action required for fresh 11.x builds. Authors back-porting V3 plugins to a 10.x build for evaluation must explicitly opt in with builder.create_network(int(trt.NetworkDefinitionCreationFlag.STRONGLY_TYPED)).

Migrating Weight Streaming APIs#

The weight streaming API has been updated in TensorRT 11.x. The minimum_weight_streaming_budget property has been removed; compute a budget from streamable_weights_size and available device memory instead.

Before (TensorRT 10.x)#

1 import tensorrt as trt
2 import pycuda.driver as cuda
3
4 engine = runtime.deserialize_cuda_engine(engine_bytes)
5
6 # Old API
7 budget = engine.get_minimum_weight_streaming_budget
8 engine.weight_streaming_budget = budget
9 current = engine.weight_streaming_budget

After (TensorRT 11.x)#

 1 import tensorrt as trt
 2 import pycuda.driver as cuda
 3
 4 engine = runtime.deserialize_cuda_engine(engine_bytes)
 5
 6 # V2 API
 7 free, total = cuda.mem_get_info()
 8 weights_size = engine.streamable_weights_size
 9 budget = min(free // 2, weights_size // 2)
10 engine.weight_streaming_budget_v2 = budget
11 current = engine.weight_streaming_budget_v2

Summary of Changes#

  • weight_streaming_budget replaced by weight_streaming_budget_v2

  • get_minimum_weight_streaming_budget removed - compute your own budget based on available memory and streamable_weights_size

Migrating Memory Management APIs#

TensorRT 11.x replaces device_memory_size with device_memory_size_v2 (which returns int64) and removes create_execution_context_without_device_memory().

Before (TensorRT 10.x)#

1 engine = runtime.deserialize_cuda_engine(engine_bytes)
2
3 # Old API
4 mem_size = engine.device_memory_size
5 context = engine.create_execution_context_without_device_memory()

After (TensorRT 11.x)#

1 engine = runtime.deserialize_cuda_engine(engine_bytes)
2
3 # V2 API (returns int64)
4 mem_size = engine.device_memory_size_v2
5 context = engine.create_execution_context()

Summary of Changes#

  • device_memory_size replaced by device_memory_size_v2 (returns int64 instead of size_t)

  • create_execution_context_without_device_memory() removed - use create_execution_context() with appropriate runtime configuration

Removed Python APIs and Replacements#

Warning

The APIs listed below have been removed in TensorRT 11.x and will cause runtime errors if called. Review each entry for its replacement before upgrading.

The following Python APIs have been removed. Each entry shows the removed API and its replacement or migration path.

BuilderFlag.FP16

Strong typing with ModelOpt AutoCast

BuilderFlag.INT8

Explicit quantization with Q/DQ nodes

BuilderFlag.FP8

Explicit quantization with Q/DQ nodes

BuilderFlag.BF16

Strong typing with ModelOpt AutoCast

BuilderFlag.INT4

Explicit quantization with Q/DQ nodes

BuilderFlag.FP4

Explicit quantization with Q/DQ nodes

BuilderFlag.OBEY_PRECISION_CONSTRAINTS

Strong typing (always enforced)

BuilderFlag.PREFER_PRECISION_CONSTRAINTS

Strong typing (always enforced)

BuilderFlag.DIRECT_IO

Removed (unneeded)

IBuilderConfig.int8_calibrator

Explicit quantization with Q/DQ nodes

IBuilderConfig.set_calibration_profile()

Explicit quantization with Q/DQ nodes

IBuilderConfig.get_calibration_profile()

Explicit quantization with Q/DQ nodes

IBuilderConfig.set_quantization_flag()

Explicit quantization with Q/DQ nodes

IBuilderConfig.get_quantization_flag()

Explicit quantization with Q/DQ nodes

IBuilderConfig.quantization_flags

Explicit quantization with Q/DQ nodes

IBuilderConfig.clear_quantization_flag()

Explicit quantization with Q/DQ nodes

ICudaEngine.device_memory_size

ICudaEngine.device_memory_size_v2

ICudaEngine.device_memory_size_for_profile()

ICudaEngine.device_memory_size_for_profile_v2()

ICudaEngine.has_implicit_batch_dimension

Removed (always False)

ICudaEngine.weight_streaming_budget

ICudaEngine.weight_streaming_budget_v2

ICudaEngine.minimum_weight_streaming_budget

Compute from streamable_weights_size and available memory

ICudaEngine.create_execution_context_without_device_memory()

ICudaEngine.create_execution_context()

ICudaEngine.get_profile_tensor_values()

ICudaEngine.get_profile_tensor_values_v2()

IExecutionContext.all_input_shapes_specified

Removed (always True)

IExecutionContext.device_memory

IExecutionContext.device_memory_v2

IInt8Calibrator (all subclasses)

Explicit quantization with Q/DQ nodes

ILayer.precision

Strong typing (set types on tensors directly)

ILayer.precision_is_set

Removed

ILayer.reset_precision()

Removed

ILayer.set_output_type()

Strong typing (set types on tensors directly)

ILayer.output_type_is_set()

Removed

ILayer.reset_output_type()

Removed

INormalizationLayer.compute_precision

Removed (use strong typing)

INetworkDefinition.add_plugin_v2()

INetworkDefinition.add_plugin_v3()

INetworkDefinition.add_normalization()

INetworkDefinition.add_normalization_v2()

IPluginV2DynamicExt

IPluginV3

IPluginCreator

IPluginCreatorV3One

IPluginRegistry.register_creator() (old overload)

IPluginRegistry.register_creator() (accepts IPluginCreatorInterface)

IPluginRegistry.plugin_creator_list

IPluginRegistry.all_creators

IPluginRegistry.get_plugin_creator()

IPluginRegistry.get_creator()

IRefitter.set_dynamic_range()

Explicit quantization with Q/DQ nodes

IRefitter.get_dynamic_range_min()

Explicit quantization with Q/DQ nodes

IRefitter.get_dynamic_range_max()

Explicit quantization with Q/DQ nodes

IRefitter.get_tensors_with_dynamic_range()

Explicit quantization with Q/DQ nodes

ITensor.set_type()

Strong typing (type determined by network construction)

ITensor.dynamic_range

Explicit quantization with Q/DQ nodes

ITensor.is_dynamic_range_set

Removed

ITensor.reset_dynamic_range()

Removed

TacticSource.CUBLAS

Removed

TacticSource.CUBLAS_LT

Removed

TacticSource.CUDNN

Removed