Migrating from TensorRT 10.x to 11.x#
TensorRT 11.x introduces several breaking changes that affect how you build engines, run inference, and use plugins. The sections below summarize the key changes. For language-specific and tool-specific migration details, select the page that matches your integration:
- Python Integration
Migrate Python inference and engine-build code, and review Python APIs removed in 11.x with their replacements.
- C++ Integration
Migrate C++ inference and engine-build code, and review C++ APIs and plugins removed in 11.x with their replacements.
- trtexec Command-Line Usage
Migrate
trtexeccommand lines, and review removed or deprecated flag replacements.- Safety Runtime
Migrate safety runtime code paths, including build-time safety scope validation and kernel checker changes.
- NVIDIA DriveOS
Platform-specific migration guidance for DRIVE Thor and DRIVE Orin, including version compatibility and DLA considerations.
- Jetson / JetPack
Platform-specific migration guidance for Jetson Orin and Jetson Thor, including JetPack dependencies and cross-compilation.
- IEngineInspector JSON Output
Migrate code that parses
IEngineInspectorJSON output, including the replacement ofBindingswithI/O Tensorsand the split of theFormat/DataTypefield.
Strongly Typed Networks Are Now Required#
TensorRT 11.x removes all weak typing APIs. In TensorRT 10.x, you could enable reduced-precision types on the model level by setting builder flags such as BuilderFlag::kFP16 or BuilderFlag::kINT8, and TensorRT would automatically consider lower-precision kernels. While this made it simple to get higher throughput, it could compromise accuracy and did not provide adequate control to model authors.
Similarly, implicit quantization is not supported in TensorRT 11.x. To quantize a model, you must add quantize and dequantize layers to the network.
The most straightforward way to get results similar to using the trtexec --fp16 flag or BuilderFlag::kFP16 is to use ModelOpt’s AutoCast feature on your model. Refer to the ModelOpt AutoCast documentation for more information; however, for a basic conversion from FP32 layers to FP16, run:
python -m modelopt.onnx.autocast --onnx_path model.onnx
For quantizing a network, run:
python -m modelopt.onnx.quantization --onnx_path model.onnx --calibration_data data.npz
For more control or other advanced quantization topics (such as quantization-aware training), refer to the ModelOpt Quantization documentation.
You can also manually control a layer’s I/O types by using INetworkDefinition::addCast or adding addCast nodes to the ONNX graph.
Similarly, you can apply quantization to a layer in your network using the INetworkDefinition::addQuantize and INetworkDefinition::addDequantize APIs, or by adding QuantizeLinear and DequantizeLinear nodes to the ONNX graph manually. This approach works, but determining which layers must run at high precision and calibrating the quantization scales for optimal accuracy can be challenging. For most models, use ModelOpt to apply quantization effectively.
V3 Is the Recommended Plugin Interface#
The entire V2 plugin family has been removed in TensorRT 11.0: IPluginV2, IPluginV2Ext, IPluginV2IOExt, IPluginV2DynamicExt, IPluginCreator, IPluginV2Layer, and INetworkDefinition::addPluginV2() are no longer available. Code that referenced any of these classes will fail to compile against the TensorRT 11.0 headers; existing V2 plugins must be ported to IPluginV3 with IPluginCreatorV3One and registered through INetworkDefinition::addPluginV3(). Related removals that affected V2 plugins in earlier releases include:
IPluginV2Ext::isOutputBroadcastAcrossBatchandIPluginV2Ext::canBroadcastInputAcrossBatch(also inherited byIPluginV2DynamicExt) — implicit-batch broadcast helpers; no replacement is needed because implicit batch was already unsupported since 10.0.The legacy
IPluginRegistryoverloads that tookIPluginCreator:registerCreator(IPluginCreator&, ...),deregisterCreator(IPluginCreator const&),getPluginCreator(...), andgetPluginCreatorList(...). Use theIPluginCreatorInterface-typed overloads instead.
Key differences between V2 and V3:
Capability-based design:
IPluginV3uses separate capability interfaces (IPluginV3OneCore,IPluginV3OneBuild,IPluginV3OneRuntime) instead of a single monolithic class.Data-dependent output shapes: V3 supports declaring size tensors via
IExprBuilder::declareSizeTensor(), enabling plugins with data-dependent output shapes.Shape inputs:
addPluginV3()accepts both data inputs and shape inputs, unlikeaddPluginV2()which only accepts data inputs.Phase-aware creation:
IPluginCreatorV3One::createPlugin()receives aTensorRTPhaseparameter, allowing different behavior during build vs. runtime.
V2 API Methods Replaced with Updated Versions#
Several API methods have been replaced with updated versions that use int64_t instead of size_t, accept additional parameters, or support asynchronous operation:
Category |
Removed |
Replacement |
|---|---|---|
Device memory size |
|
|
Device memory (profile) |
|
|
Weight streaming budget |
|
|
Profile tensor values |
|
|
Deserialization |
|
|
Other Removals#
Tactic sources: The
kCUBLAS,kCUBLAS_LT, andkCUDNNvalues of theTacticSourceenum have been removed. TensorRT no longer uses these external libraries for core operations.Implicit batch:
allInputShapesSpecified()has been removed.Built-in plugins removed: 16 plugins deprecated before TensorRT 10 have been removed, including the NMS family (
BatchedNMS_TRT,BatchedNMSDynamic_TRT,EfficientNMS_ONNX_TRT,NMS_TRT,NMSDynamic_TRT), activations (Clip_TRT,CustomGeluPluginDynamic,LReLU_TRT), andBatchTilePlugin_TRT,CoordConvAC,Normalize_TRT,Proposal,SingleStepLSTMPlugin,SpecialSlice_TRT,Split. Refer to Removed C++ Plugins and Replacements for replacements.BERT plugin family deprecated: The OSS BERT plugin classes (
bertQKVToContextPlugin/CustomQKVToContextPluginDynamicversions 1 through 4, andCustomEmbLayerNormPluginDynamic) are deprecated in 11.0.0 and scheduled for removal in a future release. Use the native attention path (addAttentionV2) for QKV-to-context, and standardIGatherLayer+addNormalizationV2()for embedding + layer normalization. Refer to Deprecated BERT Plugins for per-plugin replacements.ONNX parser:
supportsModel()andparseWithWeightDescriptors()have been removed.Layer overloads: Older overloads of
addTopK,addNonZero, andaddNMSthat did not accept anindicesTypeparameter have been removed. Use the versions that accept aDataTypeparameter for the indices output type.Multi-Device preview flag: The
PreviewFeature::kMULTIDEVICE_RUNTIME_10_16value has been removed.setPreviewFeature()is no longer required to access Multi-Device Inference features.
Note
If you are migrating from TensorRT 8.x, refer to the appendix first, then return here for the 10.x-to-11.x changes.