Migrating C++ Code from TensorRT 10.x to 11.x#
This page describes how to update C++ code when you migrate from TensorRT 10.x to 11.x: paired examples for strongly typed networks, explicit quantization, plugin migration, and updated runtime APIs, followed by lists of C++ APIs added and removed in 11.x.
Migrating from Weak Typing to Strong Typing#
TensorRT 11.x removes all precision-enabling builder flags such as BuilderFlag::kFP16 and BuilderFlag::kINT8. Use ModelOpt AutoCast to convert your ONNX model to mixed precision before building.
Before (TensorRT 10.x)#
1 auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(logger));
2 auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(0));
3 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
4
5 // Weak typing: TensorRT automatically considers FP16 kernels
6 config->setFlag(BuilderFlag::kFP16);
7
8 auto parser = SampleUniquePtr<nvonnxparser::IParser>(
9 nvonnxparser::createParser(*network, logger));
10 parser->parseFromFile("model.onnx", static_cast<int>(nvinfer1::ILogger::Severity::kWARNING));
11
12 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));
In TensorRT 11.x, BuilderFlag::kFP16 and all other precision-enabling builder flags have been removed. Use ModelOpt AutoCast to convert the ONNX model to mixed precision before building.
After (TensorRT 11.x)#
1 // Step 1: Convert model to mixed precision offline using ModelOpt:
2 // python -m modelopt.onnx.autocast --onnx_path model.onnx
3
4 // Step 2: Build with strongly typed network (always on in 11.x)
5 auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(logger));
6 auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(0));
7 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
8
9 // No precision flags needed - the model itself specifies types
10
11 auto parser = SampleUniquePtr<nvonnxparser::IParser>(
12 nvonnxparser::createParser(*network, logger));
13 parser->parseFromFile("model_fp16.onnx", static_cast<int>(nvinfer1::ILogger::Severity::kWARNING));
14
15 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));
Summary of Changes#
Removed
config->setFlag(BuilderFlag::kFP16)and all other precision flags (kINT8,kFP8,kBF16,kINT4,kFP4)Added an offline preprocessing step using ModelOpt AutoCast to produce a mixed-precision ONNX model
No code changes needed for the build path itself beyond removing the flag
Migrating INT8 Calibration to Explicit Quantization#
TensorRT 11.x removes IInt8Calibrator and all its subclasses, along with setInt8Calibrator(). Use ModelOpt or manual Q/DQ nodes for explicit quantization instead.
Before (TensorRT 10.x)#
1 class MyCalibrator : public nvinfer1::IInt8EntropyCalibrator2
2 {
3 public:
4 int32_t getBatchSize() const noexcept override { return 1; }
5
6 bool getBatch(void* bindings[], char const* names[], int32_t nbBindings) noexcept override
7 {
8 // Fill bindings with calibration data
9 if (mCurrentBatch >= mNumBatches)
10 return false;
11 // ... copy data to GPU
12 mCurrentBatch++;
13 return true;
14 }
15
16 void const* readCalibrationCache(size_t& length) noexcept override { return nullptr; }
17 void writeCalibrationCache(void const* ptr, size_t length) noexcept override {}
18
19 private:
20 int32_t mCurrentBatch{0};
21 int32_t mNumBatches{100};
22 };
23
24 // Usage
25 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
26 config->setFlag(BuilderFlag::kINT8);
27
28 MyCalibrator calibrator;
29 config->setInt8Calibrator(&calibrator);
30
31 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));
In TensorRT 11.x, IInt8Calibrator and all subclasses have been removed along with setInt8Calibrator(). Use ModelOpt or manual Q/DQ nodes.
After (TensorRT 11.x)#
1 // Step 1: Quantize the model offline using ModelOpt:
2 // python -m modelopt.onnx.quantization --onnx_path model.onnx --calibration_data data.npz
3 //
4 // Alternatively, add QuantizeLinear/DequantizeLinear nodes to the ONNX graph manually,
5 // or use the INetworkDefinition::addQuantize() and addDequantize() APIs.
6
7 // Step 2: Build the pre-quantized model
8 auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(logger));
9 auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(0));
10 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
11
12 auto parser = SampleUniquePtr<nvonnxparser::IParser>(
13 nvonnxparser::createParser(*network, logger));
14 parser->parseFromFile("model_quantized.onnx",
15 static_cast<int>(nvinfer1::ILogger::Severity::kWARNING));
16
17 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));
Summary of Changes#
Removed the
IInt8Calibratorsubclass entirelyRemoved
config->setFlag(BuilderFlag::kINT8)andconfig->setInt8Calibrator()Quantization is applied to the model offline (Q/DQ nodes in the ONNX graph) or via the
addQuantize()/addDequantize()network definition APIs
Migrating Plugins from IPluginV2DynamicExt to IPluginV3#
The following example shows a complete plugin migration from V2 to V3 using a NonZero plugin that computes the indices of non-zero elements. This demonstrates V3’s support for data-dependent output shapes, which was not possible with V2.
See also
- Side-by-Side V2 ↔ V3 API Mapping
Method-by-method mapping table grouped by lifecycle phase (core, build, runtime, serialization, network attachment).
- Known Migration Issues
Known issues encountered when porting V2 plugins, including the empty
PluginFieldinitializer crash and the strongly-typed network requirement.- Performance: Resolving V2 → V3 Regressions
Checklist for resolving performance regressions after migrating a plugin from
IPluginV2DynamicExttoIPluginV3.
Before (TensorRT 10.x - IPluginV2DynamicExt)#
1 class NonZeroPluginV2 : public nvinfer1::IPluginV2DynamicExt
2 {
3 public:
4 // IPluginV2 core methods
5 char const* getPluginType() const noexcept override { return "NonZeroPlugin"; }
6 char const* getPluginVersion() const noexcept override { return "1"; }
7 int32_t getNbOutputs() const noexcept override { return 1; }
8
9 // Output dimensions - limited to expressions of input dimensions only
10 DimsExprs getOutputDimensions(int32_t outputIndex, DimsExprs const* inputs,
11 int32_t nbInputs, IExprBuilder& exprBuilder) noexcept override
12 {
13 // Cannot express data-dependent shapes - must use an upper bound
14 DimsExprs output;
15 output.nbDims = 2;
16 output.d[0] = exprBuilder.operation(DimensionOperation::kPROD,
17 *inputs[0].d[0], *inputs[0].d[1]); // Upper bound: R * C
18 output.d[1] = exprBuilder.constant(2);
19 return output;
20 }
21
22 bool supportsFormatCombination(int32_t pos, PluginTensorDesc const* inOut,
23 int32_t nbInputs, int32_t nbOutputs) noexcept override
24 {
25 return inOut[pos].format == TensorFormat::kLINEAR;
26 }
27
28 void configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs,
29 DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept override {}
30
31 int32_t enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc,
32 void const* const* inputs, void* const* outputs,
33 void* workspace, cudaStream_t stream) noexcept override
34 {
35 // Execute kernel
36 return 0;
37 }
38
39 size_t getWorkspaceSize(PluginTensorDesc const* inputs, int32_t nbInputs,
40 PluginTensorDesc const* outputs, int32_t nbOutputs) const noexcept override
41 {
42 return 0;
43 }
44
45 // Serialization
46 size_t getSerializationSize() const noexcept override { return sizeof(bool); }
47 void serialize(void* buffer) const noexcept override
48 {
49 *reinterpret_cast<bool*>(buffer) = mRowOrder;
50 }
51
52 IPluginV2DynamicExt* clone() const noexcept override
53 {
54 return new NonZeroPluginV2(mRowOrder);
55 }
56
57 // ... other required IPluginV2 methods (destroy, setPluginNamespace, etc.)
58
59 private:
60 bool mRowOrder{true};
61 };
62
63 // V2 Plugin Creator
64 class NonZeroCreatorV2 : public nvinfer1::IPluginCreator
65 {
66 public:
67 char const* getPluginName() const noexcept override { return "NonZeroPlugin"; }
68 char const* getPluginVersion() const noexcept override { return "1"; }
69 PluginFieldCollection const* getFieldNames() noexcept override { return &mFC; }
70
71 IPluginV2* createPlugin(char const* name, PluginFieldCollection const* fc) noexcept override
72 {
73 return new NonZeroPluginV2(/*rowOrder=*/true);
74 }
75
76 IPluginV2* deserializePlugin(char const* name, void const* data,
77 size_t length) noexcept override
78 {
79 bool rowOrder = *reinterpret_cast<bool const*>(data);
80 return new NonZeroPluginV2(rowOrder);
81 }
82
83 // ... other required methods
84
85 private:
86 PluginFieldCollection mFC{};
87 };
88
89 // Usage
90 NonZeroPluginV2 plugin(/*rowOrder=*/true);
91 auto* layer = network->addPluginV2(&inputTensor, 1, plugin);
After (TensorRT 11.x - IPluginV3)#
1 class NonZeroPlugin : public IPluginV3, public IPluginV3OneCore,
2 public IPluginV3OneBuild, public IPluginV3OneRuntime
3 {
4 public:
5 NonZeroPlugin(bool rowOrder) : mRowOrder(rowOrder) {}
6
7 // IPluginV3 - return the appropriate capability interface
8 IPluginCapability* getCapabilityInterface(PluginCapabilityType type) noexcept override
9 {
10 if (type == PluginCapabilityType::kBUILD)
11 return static_cast<IPluginV3OneBuild*>(this);
12 if (type == PluginCapabilityType::kRUNTIME)
13 return static_cast<IPluginV3OneRuntime*>(this);
14 return static_cast<IPluginV3OneCore*>(this);
15 }
16
17 // IPluginV3OneCore
18 AsciiChar const* getPluginName() const noexcept override { return "NonZeroPlugin"; }
19 AsciiChar const* getPluginVersion() const noexcept override { return "1"; }
20 AsciiChar const* getPluginNamespace() const noexcept override { return ""; }
21
22 // IPluginV3OneBuild
23 int32_t getNbOutputs() const noexcept override { return 2; } // data + size tensor
24
25 int32_t getOutputDataTypes(DataType* outputTypes, int32_t nbOutputs,
26 DataType const* inputTypes, int32_t nbInputs) const noexcept override
27 {
28 outputTypes[0] = DataType::kINT32; // non-zero indices
29 outputTypes[1] = DataType::kINT64; // size tensor
30 return 0;
31 }
32
33 // Output shapes - V3 supports data-dependent shapes via declareSizeTensor
34 int32_t getOutputShapes(DimsExprs const* inputs, int32_t nbInputs,
35 DimsExprs const* shapeInputs, int32_t nbShapeInputs,
36 DimsExprs* outputs, int32_t nbOutputs,
37 IExprBuilder& exprBuilder) noexcept override
38 {
39 auto upperBound = exprBuilder.operation(DimensionOperation::kPROD,
40 *inputs[0].d[0], *inputs[0].d[1]);
41 auto optValue = exprBuilder.operation(DimensionOperation::kFLOOR_DIV,
42 *upperBound, *exprBuilder.constant(2));
43
44 // Declare a size tensor - enables data-dependent output shapes
45 auto numNonZero = exprBuilder.declareSizeTensor(1, *optValue, *upperBound);
46
47 outputs[0].nbDims = 2;
48 outputs[0].d[0] = numNonZero; // Data-dependent dimension
49 outputs[0].d[1] = exprBuilder.constant(2);
50
51 outputs[1].nbDims = 0; // Size tensor is a scalar
52 return 0;
53 }
54
55 bool supportsFormatCombination(int32_t pos, DynamicPluginTensorDesc const* inOut,
56 int32_t nbInputs, int32_t nbOutputs) noexcept override
57 {
58 return inOut[pos].desc.format == TensorFormat::kLINEAR;
59 }
60
61 int32_t configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs,
62 DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept override
63 {
64 return 0;
65 }
66
67 size_t getWorkspaceSize(DynamicPluginTensorDesc const* inputs, int32_t nbInputs,
68 DynamicPluginTensorDesc const* outputs, int32_t nbOutputs) const noexcept override
69 {
70 return 0;
71 }
72
73 // IPluginV3OneRuntime
74 int32_t enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc,
75 void const* const* inputs, void* const* outputs,
76 void* workspace, cudaStream_t stream) noexcept override
77 {
78 // Execute kernel - same as V2
79 return 0;
80 }
81
82 int32_t onShapeChange(PluginTensorDesc const* in, int32_t nbInputs,
83 PluginTensorDesc const* out, int32_t nbOutputs) noexcept override
84 {
85 return 0;
86 }
87
88 // Serialization - uses PluginFieldCollection instead of raw bytes
89 PluginFieldCollection const* getFieldsToSerialize() noexcept override
90 {
91 mDataToSerialize.clear();
92 mDataToSerialize.emplace_back("rowOrder", &mRowOrder, PluginFieldType::kINT32, 1);
93 mFCToSerialize.nbFields = mDataToSerialize.size();
94 mFCToSerialize.fields = mDataToSerialize.data();
95 return &mFCToSerialize;
96 }
97
98 IPluginV3* attachToContext(IPluginResourceContext* context) noexcept override
99 {
100 return clone();
101 }
102
103 IPluginV3* clone() noexcept override
104 {
105 return new NonZeroPlugin(mRowOrder);
106 }
107
108 private:
109 bool mRowOrder{true};
110 std::vector<PluginField> mDataToSerialize;
111 PluginFieldCollection mFCToSerialize{};
112 };
113
114 // V3 Plugin Creator
115 class NonZeroCreator : public nvinfer1::IPluginCreatorV3One
116 {
117 public:
118 NonZeroCreator()
119 {
120 mPluginAttributes.emplace_back("rowOrder", nullptr, PluginFieldType::kINT32, 1);
121 mFC.nbFields = mPluginAttributes.size();
122 mFC.fields = mPluginAttributes.data();
123 }
124
125 char const* getPluginName() const noexcept override { return "NonZeroPlugin"; }
126 char const* getPluginVersion() const noexcept override { return "1"; }
127 char const* getPluginNamespace() const noexcept override { return ""; }
128 PluginFieldCollection const* getFieldNames() noexcept override { return &mFC; }
129
130 // Phase-aware creation - no separate deserializePlugin needed
131 IPluginV3* createPlugin(char const* name, PluginFieldCollection const* fc,
132 TensorRTPhase phase) noexcept override
133 {
134 bool rowOrder = true;
135 for (int32_t i = 0; i < fc->nbFields; ++i)
136 {
137 if (std::string_view(fc->fields[i].name) == "rowOrder")
138 rowOrder = *static_cast<bool const*>(fc->fields[i].data);
139 }
140 return new NonZeroPlugin(rowOrder);
141 }
142
143 private:
144 PluginFieldCollection mFC{};
145 std::vector<PluginField> mPluginAttributes;
146 };
147
148 // Usage - addPluginV3 accepts both data inputs and shape inputs
149 NonZeroPlugin plugin(/*rowOrder=*/true);
150 ITensor* inputs[] = {&inputTensor};
151 auto* layer = network->addPluginV3(inputs, 1, nullptr, 0, plugin);
Summary of Changes#
Plugin class inherits from
IPluginV3,IPluginV3OneCore,IPluginV3OneBuild, andIPluginV3OneRuntimeinstead ofIPluginV2DynamicExtAdded
getCapabilityInterface()to return the appropriate interface for each phase (core,build,runtime)getOutputDimensions()replaced bygetOutputShapes(), which supports data-dependent output shapes usingexprBuilder.declareSizeTensor()Added required
getOutputDataTypes()methodserialize()/getSerializationSize()replaced bygetFieldsToSerialize(), which returns aPluginFieldCollectionfor structured serializationAdded
onShapeChange()andattachToContext()methodsCreator inherits from
IPluginCreatorV3Oneinstead ofIPluginCreator;createPlugin()takes aTensorRTPhaseparameter, anddeserializePlugin()is no longer needed -createPlugin()handles both build and runtime phasesaddPluginV2(inputs, nbInputs, plugin)replaced byaddPluginV3(inputs, nbInputs, shapeInputs, nbShapeInputs, plugin)
Known Issues When Migrating Plugins#
Empty PluginField initializers can crash V3 dispatch. When a plugin advertises a
PluginFieldwith anullptrdatapointer andlength == 0, the V3 creator dispatch path can dereference the pointer during build or deserialization. Populate every entry with a non-null sentinel buffer, even when the value is unused at runtime:// Bad — empty initializer mPluginAttributes.emplace_back("flag", nullptr, PluginFieldType::kINT32, 0); // Good — non-null sentinel keeps the dispatch path safe static int32_t kDummy = 0; mPluginAttributes.emplace_back("flag", &kDummy, PluginFieldType::kINT32, 1);
Use strongly-typed networks with IPluginV3. Mixing
IPluginV3plugins with weakly-typed networks can hit fusion paths that were not exercised byIPluginV2DynamicExtand trigger crashes. In TensorRT 11.0.0 all precision-enabling builder flags (BuilderFlag::kFP16,kINT8,kBF16,kFP8,kINT4,kFP4) have been removed, so any network you build is strongly typed by default; no action required for fresh 11.x builds. Authors back-porting V3 plugins to a 10.x build for evaluation must explicitly opt in withcreateNetworkV2(NetworkDefinitionCreationFlag::kSTRONGLY_TYPED).
Migrating Weight Streaming APIs#
The weight streaming API has been updated in TensorRT 11.x. The getMinimumWeightStreamingBudget() method has been removed; compute a budget from getStreamableWeightsSize() and available device memory instead.
Before (TensorRT 10.x)#
1 auto engine = SampleUniquePtr<ICudaEngine>(
2 runtime->deserializeCudaEngine(engineData, engineSize));
3
4 // Old API
5 int64_t minBudget = engine->getMinimumWeightStreamingBudget();
6 engine->setWeightStreamingBudget(minBudget);
7 int64_t currentBudget = engine->getWeightStreamingBudget();
After (TensorRT 11.x)#
1 auto engine = SampleUniquePtr<ICudaEngine>(
2 runtime->deserializeCudaEngine(engineData, engineSize));
3
4 // V2 API
5 size_t freeMem, totalMem;
6 cudaMemGetInfo(&freeMem, &totalMem);
7 int64_t weightsSize = engine->getStreamableWeightsSize();
8 int64_t budget = std::min(static_cast<int64_t>(freeMem / 2), weightsSize / 2);
9 engine->setWeightStreamingBudgetV2(budget);
10 int64_t currentBudget = engine->getWeightStreamingBudgetV2();
Summary of Changes#
setWeightStreamingBudget()replaced bysetWeightStreamingBudgetV2()getWeightStreamingBudget()replaced bygetWeightStreamingBudgetV2()getMinimumWeightStreamingBudget()removed - compute a budget usinggetStreamableWeightsSize()and available device memory
Migrating Memory Management APIs#
TensorRT 11.x replaces getDeviceMemorySize() with getDeviceMemorySizeV2() (which returns int64_t), removes createExecutionContextWithoutDeviceMemory(), and replaces setDeviceMemory(void*) with setDeviceMemoryV2(void*, int64_t).
Before (TensorRT 10.x)#
1 auto engine = SampleUniquePtr<ICudaEngine>(
2 runtime->deserializeCudaEngine(engineData, engineSize));
3
4 // Old APIs
5 size_t memSize = engine->getDeviceMemorySize();
6 auto context = SampleUniquePtr<IExecutionContext>(
7 engine->createExecutionContextWithoutDeviceMemory());
8
9 void* deviceMem;
10 cudaMalloc(&deviceMem, memSize);
11 context->setDeviceMemory(deviceMem);
After (TensorRT 11.x)#
1 auto engine = SampleUniquePtr<ICudaEngine>(
2 runtime->deserializeCudaEngine(engineData, engineSize));
3
4 // V2 APIs - int64_t sizes, explicit size parameter
5 int64_t memSize = engine->getDeviceMemorySizeV2();
6 auto context = SampleUniquePtr<IExecutionContext>(engine->createExecutionContext());
7
8 void* deviceMem;
9 cudaMalloc(&deviceMem, memSize);
10 context->setDeviceMemoryV2(deviceMem, memSize);
Summary of Changes#
getDeviceMemorySize()replaced bygetDeviceMemorySizeV2()(returnsint64_tinstead ofsize_t)createExecutionContextWithoutDeviceMemory()removed - usecreateExecutionContext()setDeviceMemory(void*)replaced bysetDeviceMemoryV2(void*, int64_t), which takes an explicit size parameter
Removed C++ APIs and Replacements#
Warning
The APIs listed below have been removed in TensorRT 11.x and will cause compile-time errors if used. Review each entry for its replacement before upgrading.
BuilderFlag::kFP16Strong typing with ModelOpt AutoCast
BuilderFlag::kINT8Explicit quantization with Q/DQ nodes
BuilderFlag::kFP8Explicit quantization with Q/DQ nodes
BuilderFlag::kBF16Strong typing with ModelOpt AutoCast
BuilderFlag::kINT4Explicit quantization with Q/DQ nodes
BuilderFlag::kFP4Explicit quantization with Q/DQ nodes
BuilderFlag::kOBEY_PRECISION_CONSTRAINTSStrong typing (always enforced)
BuilderFlag::kPREFER_PRECISION_CONSTRAINTSStrong typing (always enforced)
BuilderFlag::kDIRECT_IORemoved (not needed in 11.x)
IAlgorithm,IAlgorithmContext,IAlgorithmIOInfo,IAlgorithmSelector,IAlgorithmVariantUse editable mode in
ITimingCacheinstead.IBuilderConfig::setInt8Calibrator(IInt8Calibrator*)Explicit quantization with Q/DQ nodes
IBuilderConfig::getInt8Calibrator()Removed
IBuilderConfig::setCalibrationProfile(IOptimizationProfile const*)Removed
IBuilderConfig::getCalibrationProfile()Removed
IBuilderConfig::setQuantizationFlags(QuantizationFlags)Removed
IBuilderConfig::getQuantizationFlags()Removed
IBuilderConfig::clearQuantizationFlag(QuantizationFlag)Removed
IBuilderConfig::setQuantizationFlag(QuantizationFlag)Removed
IBuilderConfig::getQuantizationFlag(QuantizationFlag)Removed
ICudaEngine::createExecutionContextWithoutDeviceMemory()ICudaEngine::createExecutionContext()ICudaEngine::getDeviceMemorySize()ICudaEngine::getDeviceMemorySizeV2()ICudaEngine::getDeviceMemorySizeForProfile(int32_t)ICudaEngine::getDeviceMemorySizeForProfileV2(int32_t)ICudaEngine::getMinimumWeightStreamingBudget()Compute from
getStreamableWeightsSize()ICudaEngine::getProfileTensorValues(char const*, int32_t, OptProfileSelector)ICudaEngine::getProfileTensorValuesV2()ICudaEngine::getWeightStreamingBudget()ICudaEngine::getWeightStreamingBudgetV2()ICudaEngine::hasImplicitBatchDimension()Removed (always
false)ICudaEngine::setWeightStreamingBudget(int64_t)ICudaEngine::setWeightStreamingBudgetV2(int64_t)IExecutionContext::allInputShapesSpecified()Removed (always
true)IExecutionContext::setDeviceMemory(void*)IExecutionContext::setDeviceMemoryV2(void*, int64_t)IGpuAllocator::allocate(uint64_t, uint64_t, AllocatorFlags)IGpuAllocator::allocateAsync(uint64_t, uint64_t, AllocatorFlags, cudaStream_t)IGpuAllocator::deallocate(void*)IGpuAllocator::deallocateAsync(void*, cudaStream_t)IInt8Calibrator(all subclasses)Explicit quantization with Q/DQ nodes
ILayer::setPrecision(DataType)Strong typing (set types on tensors directly)
ILayer::getPrecision()Removed
ILayer::precisionIsSet()Removed
ILayer::resetPrecision()Removed
ILayer::setOutputType(int32_t, DataType)Strong typing (set types on tensors directly)
ILayer::outputTypeIsSet(int32_t)Removed
ILayer::resetOutputType(int32_t)Removed
INetworkDefinition::addAttention(..., bool)INetworkDefinition::addAttentionV2(..., CausalMaskKind)INetworkDefinition::addNMS(ITensor&, ITensor&, ITensor&)INetworkDefinition::addNMS(..., DataType)(4-arg version)INetworkDefinition::addNonZero(ITensor&)INetworkDefinition::addNonZero(ITensor&, DataType)INetworkDefinition::addNormalization(...)INetworkDefinition::addNormalizationV2(...)INetworkDefinition::addPluginV2(ITensor* const*, int32_t, IPluginV2&)INetworkDefinition::addPluginV3(...)INetworkDefinition::addTopK(ITensor&, TopKOperation, int32_t, uint32_t)INetworkDefinition::addTopK(..., DataType)(5-arg version)INormalizationLayer::setComputePrecision(DataType)Removed (use strong typing)
IOutputAllocator::reallocateOutput(char const*, void*, uint64_t, uint64_t)IOutputAllocator::reallocateOutputAsync(..., cudaStream_t)IPluginCreatorIPluginCreatorV3OneIPluginRegistry::deregisterCreator(IPluginCreator const&)IPluginRegistry::deregisterCreator(IPluginCreatorInterface const&)IPluginRegistry::getPluginCreator(...)IPluginRegistry::getCreator(...)IPluginRegistry::getPluginCreatorList(int32_t*)IPluginRegistry::getAllCreators(int32_t*)IPluginRegistry::registerCreator(IPluginCreator&, ...)IPluginRegistry::registerCreator(IPluginCreatorInterface&, ...)IPluginV2DynamicExtIPluginV3IPluginV2ExtIPluginV3IPluginV2IOExtIPluginV3IPluginV2LayerIPluginV3LayerIRefitter::setDynamicRange(char const*, float, float)Explicit quantization with Q/DQ nodes
IRefitter::getDynamicRangeMin(char const*)Removed
IRefitter::getDynamicRangeMax(char const*)Removed
IRefitter::getTensorsWithDynamicRange()Removed
IRuntime::deserializeCudaEngine(IStreamReader&)IRuntime::deserializeCudaEngine(IStreamReaderV2&)ITensor::setType(DataType)Strong typing (type determined by network construction)
ITensor::setDynamicRange(float, float)Explicit quantization with Q/DQ nodes
ITensor::dynamicRangeIsSet()Removed
ITensor::resetDynamicRange()Removed
ITensor::getDynamicRangeMin()Removed
ITensor::getDynamicRangeMax()Removed
ITensor::setBroadcastAcrossBatch(bool)Removed (implicit batch not supported)
ITensor::getBroadcastAcrossBatch()Removed (implicit batch not supported)
TacticSource::kCUBLASRemoved
TacticSource::kCUBLAS_LTRemoved
TacticSource::kCUDNNRemoved
DetectionOutputParametersRemoved
NMSParametersRemoved
CodeTypeSSDRemoved
Removed C++ Plugins and Replacements#
Warning
The plugins listed below have been removed in TensorRT 11.x. Using them will cause compilation or linker errors. Review each entry for its replacement before upgrading.
BatchedNMS_TRTUse
INetworkDefinition::addNMS()BatchedNMSDynamic_TRTUse
INetworkDefinition::addNMS()BatchTilePlugin_TRTImplement with standard TensorRT layers
Clip_TRTUse
INetworkDefinition::addActivation()withkCLIPCoordConvACImplement with standard TensorRT layers (concatenate coordinate channels with
IConcatenationLayer, then apply convolution)CustomGeluPluginDynamicUse
INetworkDefinition::addActivation()withkGELU_ERForkGELU_TANHEfficientNMS_ONNX_TRTUse
INetworkDefinition::addNMS()LReLU_TRTUse
INetworkDefinition::addActivation()withkLEAKY_RELUNMS_TRTUse
INetworkDefinition::addNMS()NMSDynamic_TRTUse
INetworkDefinition::addNMS()Normalize_TRTUse
INetworkDefinition::addNormalizationV2()ProposalImplement with standard TensorRT layers
SingleStepLSTMPluginUse
INetworkDefinition::addLoop()or standard RNN decompositionSpecialSlice_TRTUse
INetworkDefinition::addSlice()SplitUse
INetworkDefinition::addSlice()
Deprecated BERT Plugins#
The following OSS BERT plugin classes are deprecated in 11.0.0 and scheduled for removal in a future release. Migrate to the listed replacements before upgrading beyond 11.x.
bertQKVToContextPlugin/CustomQKVToContextPluginDynamicRefer to Migrate to IAttention for more information.