Is this page helpful?

Migrating from TensorRT 10.x to 11.x#

TensorRT 11.x introduces several breaking changes that affect how you build engines, run inference, and use plugins. The sections below summarize the key changes. For language-specific and tool-specific migration details, select the page that matches your integration:

Python Integration: Migrate Python inference and engine-build code, and review Python APIs removed in 11.x with their replacements.
C++ Integration: Migrate C++ inference and engine-build code, and review C++ APIs and plugins removed in 11.x with their replacements.
trtexec Command-Line Usage: Migrate trtexec command lines, and review removed or deprecated flag replacements.
Safety Runtime: Migrate safety runtime code paths, including build-time safety scope validation and kernel checker changes.
NVIDIA DriveOS: Platform-specific migration guidance for DRIVE Thor and DRIVE Orin, including version compatibility and DLA considerations.
Jetson / JetPack: Platform-specific migration guidance for Jetson Orin and Jetson Thor, including JetPack dependencies and cross-compilation.
IEngineInspector JSON Output: Migrate code that parses IEngineInspector JSON output, including the replacement of Bindings with I/O Tensors and the split of the Format/DataType field.

Strongly Typed Networks Are Now Required#

TensorRT 11.x removes all weak typing APIs. In TensorRT 10.x, you could enable reduced-precision types on the model level by setting builder flags such as BuilderFlag::kFP16 or BuilderFlag::kINT8, and TensorRT would automatically consider lower-precision kernels. While this made it simple to get higher throughput, it could compromise accuracy and did not provide adequate control to model authors.

Similarly, implicit quantization is not supported in TensorRT 11.x. To quantize a model, you must add quantize and dequantize layers to the network.

The most straightforward way to get results similar to using the trtexec --fp16 flag or BuilderFlag::kFP16 is to use ModelOpt’s AutoCast feature on your model. Refer to the ModelOpt AutoCast documentation for more information; however, for a basic conversion from FP32 layers to FP16, run:

python -m modelopt.onnx.autocast --onnx_path model.onnx

For quantizing a network, run:

python -m modelopt.onnx.quantization --onnx_path model.onnx --calibration_data data.npz

For more control or other advanced quantization topics (such as quantization-aware training), refer to the ModelOpt Quantization documentation.

You can also manually control a layer’s I/O types by using INetworkDefinition::addCast or adding addCast nodes to the ONNX graph.

Similarly, you can apply quantization to a layer in your network using the INetworkDefinition::addQuantize and INetworkDefinition::addDequantize APIs, or by adding QuantizeLinear and DequantizeLinear nodes to the ONNX graph manually. This approach works, but determining which layers must run at high precision and calibrating the quantization scales for optimal accuracy can be challenging. For most models, use ModelOpt to apply quantization effectively.

V3 Is the Recommended Plugin Interface#

The entire V2 plugin family has been removed in TensorRT 11.0: IPluginV2, IPluginV2Ext, IPluginV2IOExt, IPluginV2DynamicExt, IPluginCreator, IPluginV2Layer, and INetworkDefinition::addPluginV2() are no longer available. Code that referenced any of these classes will fail to compile against the TensorRT 11.0 headers; existing V2 plugins must be ported to IPluginV3 with IPluginCreatorV3One and registered through INetworkDefinition::addPluginV3(). Related removals that affected V2 plugins in earlier releases include:

IPluginV2Ext::isOutputBroadcastAcrossBatch and IPluginV2Ext::canBroadcastInputAcrossBatch (also inherited by IPluginV2DynamicExt) — implicit-batch broadcast helpers; no replacement is needed because implicit batch was already unsupported since 10.0.
The legacy IPluginRegistry overloads that took IPluginCreator: registerCreator(IPluginCreator&, ...), deregisterCreator(IPluginCreator const&), getPluginCreator(...), and getPluginCreatorList(...). Use the IPluginCreatorInterface-typed overloads instead.

Key differences between V2 and V3:

Capability-based design: IPluginV3 uses separate capability interfaces (IPluginV3OneCore, IPluginV3OneBuild, IPluginV3OneRuntime) instead of a single monolithic class.
Data-dependent output shapes: V3 supports declaring size tensors via IExprBuilder::declareSizeTensor(), enabling plugins with data-dependent output shapes.
Shape inputs: addPluginV3() accepts both data inputs and shape inputs, unlike addPluginV2() which only accepts data inputs.
Phase-aware creation: IPluginCreatorV3One::createPlugin() receives a TensorRTPhase parameter, allowing different behavior during build vs. runtime.

V2 API Methods Replaced with Updated Versions#

Several API methods have been replaced with updated versions that use int64_t instead of size_t, accept additional parameters, or support asynchronous operation:

Category	Removed	Replacement
Device memory size	`getDeviceMemorySize()`	`getDeviceMemorySizeV2()` (returns `int64_t`)
Device memory (profile)	`getDeviceMemorySizeForProfile()`	`getDeviceMemorySizeForProfileV2()` (returns `int64_t`)
Weight streaming budget	`setWeightStreamingBudget()` / `getWeightStreamingBudget()`	`setWeightStreamingBudgetV2()` / `getWeightStreamingBudgetV2()`
Profile tensor values	`getProfileTensorValues()`	`getProfileTensorValuesV2()` (returns `int64_t` values)
Deserialization	`deserializeCudaEngine(IStreamReader&)`	`deserializeCudaEngine(IStreamReaderV2&)`

Other Removals#

Tactic sources: The kCUBLAS, kCUBLAS_LT, and kCUDNN values of the TacticSource enum have been removed. TensorRT no longer uses these external libraries for core operations.
Implicit batch: allInputShapesSpecified() has been removed.
Built-in plugins removed: 16 plugins deprecated before TensorRT 10 have been removed, including the NMS family (BatchedNMS_TRT, BatchedNMSDynamic_TRT, EfficientNMS_ONNX_TRT, NMS_TRT, NMSDynamic_TRT), activations (Clip_TRT, CustomGeluPluginDynamic, LReLU_TRT), and BatchTilePlugin_TRT, CoordConvAC, Normalize_TRT, Proposal, SingleStepLSTMPlugin, SpecialSlice_TRT, Split. Refer to Removed C++ Plugins and Replacements for replacements.
BERT plugin family deprecated: The OSS BERT plugin classes (bertQKVToContextPlugin / CustomQKVToContextPluginDynamic versions 1 through 4, and CustomEmbLayerNormPluginDynamic) are deprecated in 11.0.0 and scheduled for removal in a future release. Use the native attention path (addAttentionV2) for QKV-to-context, and standard IGatherLayer + addNormalizationV2() for embedding + layer normalization. Refer to Deprecated BERT Plugins for per-plugin replacements.
ONNX parser: supportsModel() and parseWithWeightDescriptors() have been removed.
Layer overloads: Older overloads of addTopK, addNonZero, and addNMS that did not accept an indicesType parameter have been removed. Use the versions that accept a DataType parameter for the indices output type.
Multi-Device preview flag: The PreviewFeature::kMULTIDEVICE_RUNTIME_10_16 value has been removed. setPreviewFeature() is no longer required to access Multi-Device Inference features.

Note

If you are migrating from TensorRT 8.x, refer to the appendix first, then return here for the 10.x-to-11.x changes.