Migrating C++ Code from TensorRT 10.x to 11.x#

This page describes how to update C++ code when you migrate from TensorRT 10.x to 11.x: paired examples for strongly typed networks, explicit quantization, plugin migration, and updated runtime APIs, followed by lists of C++ APIs added and removed in 11.x.

Migrating from Weak Typing to Strong Typing#

TensorRT 11.x removes all precision-enabling builder flags such as BuilderFlag::kFP16 and BuilderFlag::kINT8. Use ModelOpt AutoCast to convert your ONNX model to mixed precision before building.

Before (TensorRT 10.x)#

 1 auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(logger));
 2 auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(0));
 3 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
 4
 5 // Weak typing: TensorRT automatically considers FP16 kernels
 6 config->setFlag(BuilderFlag::kFP16);
 7
 8 auto parser = SampleUniquePtr<nvonnxparser::IParser>(
 9     nvonnxparser::createParser(*network, logger));
10 parser->parseFromFile("model.onnx", static_cast<int>(nvinfer1::ILogger::Severity::kWARNING));
11
12 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));

In TensorRT 11.x, BuilderFlag::kFP16 and all other precision-enabling builder flags have been removed. Use ModelOpt AutoCast to convert the ONNX model to mixed precision before building.

After (TensorRT 11.x)#

 1 // Step 1: Convert model to mixed precision offline using ModelOpt:
 2 //   python -m modelopt.onnx.autocast --onnx_path model.onnx
 3
 4 // Step 2: Build with strongly typed network (always on in 11.x)
 5 auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(logger));
 6 auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(0));
 7 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
 8
 9 // No precision flags needed - the model itself specifies types
10
11 auto parser = SampleUniquePtr<nvonnxparser::IParser>(
12     nvonnxparser::createParser(*network, logger));
13 parser->parseFromFile("model_fp16.onnx", static_cast<int>(nvinfer1::ILogger::Severity::kWARNING));
14
15 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));

Summary of Changes#

  • Removed config->setFlag(BuilderFlag::kFP16) and all other precision flags (kINT8, kFP8, kBF16, kINT4, kFP4)

  • Added an offline preprocessing step using ModelOpt AutoCast to produce a mixed-precision ONNX model

  • No code changes needed for the build path itself beyond removing the flag

Migrating INT8 Calibration to Explicit Quantization#

TensorRT 11.x removes IInt8Calibrator and all its subclasses, along with setInt8Calibrator(). Use ModelOpt or manual Q/DQ nodes for explicit quantization instead.

Before (TensorRT 10.x)#

 1 class MyCalibrator : public nvinfer1::IInt8EntropyCalibrator2
 2 {
 3 public:
 4     int32_t getBatchSize() const noexcept override { return 1; }
 5
 6     bool getBatch(void* bindings[], char const* names[], int32_t nbBindings) noexcept override
 7     {
 8         // Fill bindings with calibration data
 9         if (mCurrentBatch >= mNumBatches)
10             return false;
11         // ... copy data to GPU
12         mCurrentBatch++;
13         return true;
14     }
15
16     void const* readCalibrationCache(size_t& length) noexcept override { return nullptr; }
17     void writeCalibrationCache(void const* ptr, size_t length) noexcept override {}
18
19 private:
20     int32_t mCurrentBatch{0};
21     int32_t mNumBatches{100};
22 };
23
24 // Usage
25 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
26 config->setFlag(BuilderFlag::kINT8);
27
28 MyCalibrator calibrator;
29 config->setInt8Calibrator(&calibrator);
30
31 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));

In TensorRT 11.x, IInt8Calibrator and all subclasses have been removed along with setInt8Calibrator(). Use ModelOpt or manual Q/DQ nodes.

After (TensorRT 11.x)#

 1 // Step 1: Quantize the model offline using ModelOpt:
 2 //   python -m modelopt.onnx.quantization --onnx_path model.onnx --calibration_data data.npz
 3 //
 4 // Alternatively, add QuantizeLinear/DequantizeLinear nodes to the ONNX graph manually,
 5 // or use the INetworkDefinition::addQuantize() and addDequantize() APIs.
 6
 7 // Step 2: Build the pre-quantized model
 8 auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(logger));
 9 auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(0));
10 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
11
12 auto parser = SampleUniquePtr<nvonnxparser::IParser>(
13     nvonnxparser::createParser(*network, logger));
14 parser->parseFromFile("model_quantized.onnx",
15     static_cast<int>(nvinfer1::ILogger::Severity::kWARNING));
16
17 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));

Summary of Changes#

  • Removed the IInt8Calibrator subclass entirely

  • Removed config->setFlag(BuilderFlag::kINT8) and config->setInt8Calibrator()

  • Quantization is applied to the model offline (Q/DQ nodes in the ONNX graph) or via the addQuantize() / addDequantize() network definition APIs

Migrating Plugins from IPluginV2DynamicExt to IPluginV3#

The following example shows a complete plugin migration from V2 to V3 using a NonZero plugin that computes the indices of non-zero elements. This demonstrates V3’s support for data-dependent output shapes, which was not possible with V2.

See also

Side-by-Side V2 ↔ V3 API Mapping

Method-by-method mapping table grouped by lifecycle phase (core, build, runtime, serialization, network attachment).

Known Migration Issues

Known issues encountered when porting V2 plugins, including the empty PluginField initializer crash and the strongly-typed network requirement.

Performance: Resolving V2 → V3 Regressions

Checklist for resolving performance regressions after migrating a plugin from IPluginV2DynamicExt to IPluginV3.

Before (TensorRT 10.x - IPluginV2DynamicExt)#

 1 class NonZeroPluginV2 : public nvinfer1::IPluginV2DynamicExt
 2 {
 3 public:
 4     // IPluginV2 core methods
 5     char const* getPluginType() const noexcept override { return "NonZeroPlugin"; }
 6     char const* getPluginVersion() const noexcept override { return "1"; }
 7     int32_t getNbOutputs() const noexcept override { return 1; }
 8
 9     // Output dimensions - limited to expressions of input dimensions only
10     DimsExprs getOutputDimensions(int32_t outputIndex, DimsExprs const* inputs,
11         int32_t nbInputs, IExprBuilder& exprBuilder) noexcept override
12     {
13         // Cannot express data-dependent shapes - must use an upper bound
14         DimsExprs output;
15         output.nbDims = 2;
16         output.d[0] = exprBuilder.operation(DimensionOperation::kPROD,
17             *inputs[0].d[0], *inputs[0].d[1]); // Upper bound: R * C
18         output.d[1] = exprBuilder.constant(2);
19         return output;
20     }
21
22     bool supportsFormatCombination(int32_t pos, PluginTensorDesc const* inOut,
23         int32_t nbInputs, int32_t nbOutputs) noexcept override
24     {
25         return inOut[pos].format == TensorFormat::kLINEAR;
26     }
27
28     void configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs,
29         DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept override {}
30
31     int32_t enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc,
32         void const* const* inputs, void* const* outputs,
33         void* workspace, cudaStream_t stream) noexcept override
34     {
35         // Execute kernel
36         return 0;
37     }
38
39     size_t getWorkspaceSize(PluginTensorDesc const* inputs, int32_t nbInputs,
40         PluginTensorDesc const* outputs, int32_t nbOutputs) const noexcept override
41     {
42         return 0;
43     }
44
45     // Serialization
46     size_t getSerializationSize() const noexcept override { return sizeof(bool); }
47     void serialize(void* buffer) const noexcept override
48     {
49         *reinterpret_cast<bool*>(buffer) = mRowOrder;
50     }
51
52     IPluginV2DynamicExt* clone() const noexcept override
53     {
54         return new NonZeroPluginV2(mRowOrder);
55     }
56
57     // ... other required IPluginV2 methods (destroy, setPluginNamespace, etc.)
58
59 private:
60     bool mRowOrder{true};
61 };
62
63 // V2 Plugin Creator
64 class NonZeroCreatorV2 : public nvinfer1::IPluginCreator
65 {
66 public:
67     char const* getPluginName() const noexcept override { return "NonZeroPlugin"; }
68     char const* getPluginVersion() const noexcept override { return "1"; }
69     PluginFieldCollection const* getFieldNames() noexcept override { return &mFC; }
70
71     IPluginV2* createPlugin(char const* name, PluginFieldCollection const* fc) noexcept override
72     {
73         return new NonZeroPluginV2(/*rowOrder=*/true);
74     }
75
76     IPluginV2* deserializePlugin(char const* name, void const* data,
77         size_t length) noexcept override
78     {
79         bool rowOrder = *reinterpret_cast<bool const*>(data);
80         return new NonZeroPluginV2(rowOrder);
81     }
82
83     // ... other required methods
84
85 private:
86     PluginFieldCollection mFC{};
87 };
88
89 // Usage
90 NonZeroPluginV2 plugin(/*rowOrder=*/true);
91 auto* layer = network->addPluginV2(&inputTensor, 1, plugin);

After (TensorRT 11.x - IPluginV3)#

  1 class NonZeroPlugin : public IPluginV3, public IPluginV3OneCore,
  2                   public IPluginV3OneBuild, public IPluginV3OneRuntime
  3 {
  4 public:
  5     NonZeroPlugin(bool rowOrder) : mRowOrder(rowOrder) {}
  6
  7     // IPluginV3 - return the appropriate capability interface
  8     IPluginCapability* getCapabilityInterface(PluginCapabilityType type) noexcept override
  9     {
 10         if (type == PluginCapabilityType::kBUILD)
 11             return static_cast<IPluginV3OneBuild*>(this);
 12         if (type == PluginCapabilityType::kRUNTIME)
 13             return static_cast<IPluginV3OneRuntime*>(this);
 14         return static_cast<IPluginV3OneCore*>(this);
 15     }
 16
 17     // IPluginV3OneCore
 18     AsciiChar const* getPluginName() const noexcept override { return "NonZeroPlugin"; }
 19     AsciiChar const* getPluginVersion() const noexcept override { return "1"; }
 20     AsciiChar const* getPluginNamespace() const noexcept override { return ""; }
 21
 22     // IPluginV3OneBuild
 23     int32_t getNbOutputs() const noexcept override { return 2; } // data + size tensor
 24
 25     int32_t getOutputDataTypes(DataType* outputTypes, int32_t nbOutputs,
 26         DataType const* inputTypes, int32_t nbInputs) const noexcept override
 27     {
 28         outputTypes[0] = DataType::kINT32; // non-zero indices
 29         outputTypes[1] = DataType::kINT64; // size tensor
 30         return 0;
 31     }
 32
 33     // Output shapes - V3 supports data-dependent shapes via declareSizeTensor
 34     int32_t getOutputShapes(DimsExprs const* inputs, int32_t nbInputs,
 35         DimsExprs const* shapeInputs, int32_t nbShapeInputs,
 36         DimsExprs* outputs, int32_t nbOutputs,
 37         IExprBuilder& exprBuilder) noexcept override
 38     {
 39         auto upperBound = exprBuilder.operation(DimensionOperation::kPROD,
 40             *inputs[0].d[0], *inputs[0].d[1]);
 41         auto optValue = exprBuilder.operation(DimensionOperation::kFLOOR_DIV,
 42             *upperBound, *exprBuilder.constant(2));
 43
 44         // Declare a size tensor - enables data-dependent output shapes
 45         auto numNonZero = exprBuilder.declareSizeTensor(1, *optValue, *upperBound);
 46
 47         outputs[0].nbDims = 2;
 48         outputs[0].d[0] = numNonZero; // Data-dependent dimension
 49         outputs[0].d[1] = exprBuilder.constant(2);
 50
 51         outputs[1].nbDims = 0; // Size tensor is a scalar
 52         return 0;
 53     }
 54
 55     bool supportsFormatCombination(int32_t pos, DynamicPluginTensorDesc const* inOut,
 56         int32_t nbInputs, int32_t nbOutputs) noexcept override
 57     {
 58         return inOut[pos].desc.format == TensorFormat::kLINEAR;
 59     }
 60
 61     int32_t configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs,
 62         DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept override
 63     {
 64         return 0;
 65     }
 66
 67     size_t getWorkspaceSize(DynamicPluginTensorDesc const* inputs, int32_t nbInputs,
 68         DynamicPluginTensorDesc const* outputs, int32_t nbOutputs) const noexcept override
 69     {
 70         return 0;
 71     }
 72
 73     // IPluginV3OneRuntime
 74     int32_t enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc,
 75         void const* const* inputs, void* const* outputs,
 76         void* workspace, cudaStream_t stream) noexcept override
 77     {
 78         // Execute kernel - same as V2
 79         return 0;
 80     }
 81
 82     int32_t onShapeChange(PluginTensorDesc const* in, int32_t nbInputs,
 83         PluginTensorDesc const* out, int32_t nbOutputs) noexcept override
 84     {
 85         return 0;
 86     }
 87
 88     // Serialization - uses PluginFieldCollection instead of raw bytes
 89     PluginFieldCollection const* getFieldsToSerialize() noexcept override
 90     {
 91         mDataToSerialize.clear();
 92         mDataToSerialize.emplace_back("rowOrder", &mRowOrder, PluginFieldType::kINT32, 1);
 93         mFCToSerialize.nbFields = mDataToSerialize.size();
 94         mFCToSerialize.fields = mDataToSerialize.data();
 95         return &mFCToSerialize;
 96     }
 97
 98     IPluginV3* attachToContext(IPluginResourceContext* context) noexcept override
 99     {
100         return clone();
101     }
102
103     IPluginV3* clone() noexcept override
104     {
105         return new NonZeroPlugin(mRowOrder);
106     }
107
108 private:
109     bool mRowOrder{true};
110     std::vector<PluginField> mDataToSerialize;
111     PluginFieldCollection mFCToSerialize{};
112 };
113
114 // V3 Plugin Creator
115 class NonZeroCreator : public nvinfer1::IPluginCreatorV3One
116 {
117 public:
118     NonZeroCreator()
119     {
120         mPluginAttributes.emplace_back("rowOrder", nullptr, PluginFieldType::kINT32, 1);
121         mFC.nbFields = mPluginAttributes.size();
122         mFC.fields = mPluginAttributes.data();
123     }
124
125     char const* getPluginName() const noexcept override { return "NonZeroPlugin"; }
126     char const* getPluginVersion() const noexcept override { return "1"; }
127     char const* getPluginNamespace() const noexcept override { return ""; }
128     PluginFieldCollection const* getFieldNames() noexcept override { return &mFC; }
129
130     // Phase-aware creation - no separate deserializePlugin needed
131     IPluginV3* createPlugin(char const* name, PluginFieldCollection const* fc,
132         TensorRTPhase phase) noexcept override
133     {
134         bool rowOrder = true;
135         for (int32_t i = 0; i < fc->nbFields; ++i)
136         {
137             if (std::string_view(fc->fields[i].name) == "rowOrder")
138                 rowOrder = *static_cast<bool const*>(fc->fields[i].data);
139         }
140         return new NonZeroPlugin(rowOrder);
141     }
142
143 private:
144     PluginFieldCollection mFC{};
145     std::vector<PluginField> mPluginAttributes;
146 };
147
148 // Usage - addPluginV3 accepts both data inputs and shape inputs
149 NonZeroPlugin plugin(/*rowOrder=*/true);
150 ITensor* inputs[] = {&inputTensor};
151 auto* layer = network->addPluginV3(inputs, 1, nullptr, 0, plugin);

Summary of Changes#

  • Plugin class inherits from IPluginV3, IPluginV3OneCore, IPluginV3OneBuild, and IPluginV3OneRuntime instead of IPluginV2DynamicExt

  • Added getCapabilityInterface() to return the appropriate interface for each phase (core, build, runtime)

  • getOutputDimensions() replaced by getOutputShapes(), which supports data-dependent output shapes using exprBuilder.declareSizeTensor()

  • Added required getOutputDataTypes() method

  • serialize() / getSerializationSize() replaced by getFieldsToSerialize(), which returns a PluginFieldCollection for structured serialization

  • Added onShapeChange() and attachToContext() methods

  • Creator inherits from IPluginCreatorV3One instead of IPluginCreator; createPlugin() takes a TensorRTPhase parameter, and deserializePlugin() is no longer needed - createPlugin() handles both build and runtime phases

  • addPluginV2(inputs, nbInputs, plugin) replaced by addPluginV3(inputs, nbInputs, shapeInputs, nbShapeInputs, plugin)

Known Issues When Migrating Plugins#

  • Empty PluginField initializers can crash V3 dispatch. When a plugin advertises a PluginField with a nullptr data pointer and length == 0, the V3 creator dispatch path can dereference the pointer during build or deserialization. Populate every entry with a non-null sentinel buffer, even when the value is unused at runtime:

    // Bad — empty initializer
    mPluginAttributes.emplace_back("flag", nullptr, PluginFieldType::kINT32, 0);
    
    // Good — non-null sentinel keeps the dispatch path safe
    static int32_t kDummy = 0;
    mPluginAttributes.emplace_back("flag", &kDummy, PluginFieldType::kINT32, 1);
    
  • Use strongly-typed networks with IPluginV3. Mixing IPluginV3 plugins with weakly-typed networks can hit fusion paths that were not exercised by IPluginV2DynamicExt and trigger crashes. In TensorRT 11.0.0 all precision-enabling builder flags (BuilderFlag::kFP16, kINT8, kBF16, kFP8, kINT4, kFP4) have been removed, so any network you build is strongly typed by default; no action required for fresh 11.x builds. Authors back-porting V3 plugins to a 10.x build for evaluation must explicitly opt in with createNetworkV2(NetworkDefinitionCreationFlag::kSTRONGLY_TYPED).

Migrating Weight Streaming APIs#

The weight streaming API has been updated in TensorRT 11.x. The getMinimumWeightStreamingBudget() method has been removed; compute a budget from getStreamableWeightsSize() and available device memory instead.

Before (TensorRT 10.x)#

1 auto engine = SampleUniquePtr<ICudaEngine>(
2 runtime->deserializeCudaEngine(engineData, engineSize));
3
4 // Old API
5 int64_t minBudget = engine->getMinimumWeightStreamingBudget();
6 engine->setWeightStreamingBudget(minBudget);
7 int64_t currentBudget = engine->getWeightStreamingBudget();

After (TensorRT 11.x)#

 1 auto engine = SampleUniquePtr<ICudaEngine>(
 2 runtime->deserializeCudaEngine(engineData, engineSize));
 3
 4 // V2 API
 5 size_t freeMem, totalMem;
 6 cudaMemGetInfo(&freeMem, &totalMem);
 7 int64_t weightsSize = engine->getStreamableWeightsSize();
 8 int64_t budget = std::min(static_cast<int64_t>(freeMem / 2), weightsSize / 2);
 9 engine->setWeightStreamingBudgetV2(budget);
10 int64_t currentBudget = engine->getWeightStreamingBudgetV2();

Summary of Changes#

  • setWeightStreamingBudget() replaced by setWeightStreamingBudgetV2()

  • getWeightStreamingBudget() replaced by getWeightStreamingBudgetV2()

  • getMinimumWeightStreamingBudget() removed - compute a budget using getStreamableWeightsSize() and available device memory

Migrating Memory Management APIs#

TensorRT 11.x replaces getDeviceMemorySize() with getDeviceMemorySizeV2() (which returns int64_t), removes createExecutionContextWithoutDeviceMemory(), and replaces setDeviceMemory(void*) with setDeviceMemoryV2(void*, int64_t).

Before (TensorRT 10.x)#

 1 auto engine = SampleUniquePtr<ICudaEngine>(
 2 runtime->deserializeCudaEngine(engineData, engineSize));
 3
 4 // Old APIs
 5 size_t memSize = engine->getDeviceMemorySize();
 6 auto context = SampleUniquePtr<IExecutionContext>(
 7     engine->createExecutionContextWithoutDeviceMemory());
 8
 9 void* deviceMem;
10 cudaMalloc(&deviceMem, memSize);
11 context->setDeviceMemory(deviceMem);

After (TensorRT 11.x)#

 1 auto engine = SampleUniquePtr<ICudaEngine>(
 2 runtime->deserializeCudaEngine(engineData, engineSize));
 3
 4 // V2 APIs - int64_t sizes, explicit size parameter
 5 int64_t memSize = engine->getDeviceMemorySizeV2();
 6 auto context = SampleUniquePtr<IExecutionContext>(engine->createExecutionContext());
 7
 8 void* deviceMem;
 9 cudaMalloc(&deviceMem, memSize);
10 context->setDeviceMemoryV2(deviceMem, memSize);

Summary of Changes#

  • getDeviceMemorySize() replaced by getDeviceMemorySizeV2() (returns int64_t instead of size_t)

  • createExecutionContextWithoutDeviceMemory() removed - use createExecutionContext()

  • setDeviceMemory(void*) replaced by setDeviceMemoryV2(void*, int64_t), which takes an explicit size parameter

Removed C++ APIs and Replacements#

Warning

The APIs listed below have been removed in TensorRT 11.x and will cause compile-time errors if used. Review each entry for its replacement before upgrading.

BuilderFlag::kFP16

Strong typing with ModelOpt AutoCast

BuilderFlag::kINT8

Explicit quantization with Q/DQ nodes

BuilderFlag::kFP8

Explicit quantization with Q/DQ nodes

BuilderFlag::kBF16

Strong typing with ModelOpt AutoCast

BuilderFlag::kINT4

Explicit quantization with Q/DQ nodes

BuilderFlag::kFP4

Explicit quantization with Q/DQ nodes

BuilderFlag::kOBEY_PRECISION_CONSTRAINTS

Strong typing (always enforced)

BuilderFlag::kPREFER_PRECISION_CONSTRAINTS

Strong typing (always enforced)

BuilderFlag::kDIRECT_IO

Removed (not needed in 11.x)

IAlgorithm, IAlgorithmContext, IAlgorithmIOInfo, IAlgorithmSelector, IAlgorithmVariant

Use editable mode in ITimingCache instead.

IBuilderConfig::setInt8Calibrator(IInt8Calibrator*)

Explicit quantization with Q/DQ nodes

IBuilderConfig::getInt8Calibrator()

Removed

IBuilderConfig::setCalibrationProfile(IOptimizationProfile const*)

Removed

IBuilderConfig::getCalibrationProfile()

Removed

IBuilderConfig::setQuantizationFlags(QuantizationFlags)

Removed

IBuilderConfig::getQuantizationFlags()

Removed

IBuilderConfig::clearQuantizationFlag(QuantizationFlag)

Removed

IBuilderConfig::setQuantizationFlag(QuantizationFlag)

Removed

IBuilderConfig::getQuantizationFlag(QuantizationFlag)

Removed

ICudaEngine::createExecutionContextWithoutDeviceMemory()

ICudaEngine::createExecutionContext()

ICudaEngine::getDeviceMemorySize()

ICudaEngine::getDeviceMemorySizeV2()

ICudaEngine::getDeviceMemorySizeForProfile(int32_t)

ICudaEngine::getDeviceMemorySizeForProfileV2(int32_t)

ICudaEngine::getMinimumWeightStreamingBudget()

Compute from getStreamableWeightsSize()

ICudaEngine::getProfileTensorValues(char const*, int32_t, OptProfileSelector)

ICudaEngine::getProfileTensorValuesV2()

ICudaEngine::getWeightStreamingBudget()

ICudaEngine::getWeightStreamingBudgetV2()

ICudaEngine::hasImplicitBatchDimension()

Removed (always false)

ICudaEngine::setWeightStreamingBudget(int64_t)

ICudaEngine::setWeightStreamingBudgetV2(int64_t)

IExecutionContext::allInputShapesSpecified()

Removed (always true)

IExecutionContext::setDeviceMemory(void*)

IExecutionContext::setDeviceMemoryV2(void*, int64_t)

IGpuAllocator::allocate(uint64_t, uint64_t, AllocatorFlags)

IGpuAllocator::allocateAsync(uint64_t, uint64_t, AllocatorFlags, cudaStream_t)

IGpuAllocator::deallocate(void*)

IGpuAllocator::deallocateAsync(void*, cudaStream_t)

IInt8Calibrator (all subclasses)

Explicit quantization with Q/DQ nodes

ILayer::setPrecision(DataType)

Strong typing (set types on tensors directly)

ILayer::getPrecision()

Removed

ILayer::precisionIsSet()

Removed

ILayer::resetPrecision()

Removed

ILayer::setOutputType(int32_t, DataType)

Strong typing (set types on tensors directly)

ILayer::outputTypeIsSet(int32_t)

Removed

ILayer::resetOutputType(int32_t)

Removed

INetworkDefinition::addAttention(..., bool)

INetworkDefinition::addAttentionV2(..., CausalMaskKind)

INetworkDefinition::addNMS(ITensor&, ITensor&, ITensor&)

INetworkDefinition::addNMS(..., DataType) (4-arg version)

INetworkDefinition::addNonZero(ITensor&)

INetworkDefinition::addNonZero(ITensor&, DataType)

INetworkDefinition::addNormalization(...)

INetworkDefinition::addNormalizationV2(...)

INetworkDefinition::addPluginV2(ITensor* const*, int32_t, IPluginV2&)

INetworkDefinition::addPluginV3(...)

INetworkDefinition::addTopK(ITensor&, TopKOperation, int32_t, uint32_t)

INetworkDefinition::addTopK(..., DataType) (5-arg version)

INormalizationLayer::setComputePrecision(DataType)

Removed (use strong typing)

IOutputAllocator::reallocateOutput(char const*, void*, uint64_t, uint64_t)

IOutputAllocator::reallocateOutputAsync(..., cudaStream_t)

IPluginCreator

IPluginCreatorV3One

IPluginRegistry::deregisterCreator(IPluginCreator const&)

IPluginRegistry::deregisterCreator(IPluginCreatorInterface const&)

IPluginRegistry::getPluginCreator(...)

IPluginRegistry::getCreator(...)

IPluginRegistry::getPluginCreatorList(int32_t*)

IPluginRegistry::getAllCreators(int32_t*)

IPluginRegistry::registerCreator(IPluginCreator&, ...)

IPluginRegistry::registerCreator(IPluginCreatorInterface&, ...)

IPluginV2DynamicExt

IPluginV3

IPluginV2Ext

IPluginV3

IPluginV2IOExt

IPluginV3

IPluginV2Layer

IPluginV3Layer

IRefitter::setDynamicRange(char const*, float, float)

Explicit quantization with Q/DQ nodes

IRefitter::getDynamicRangeMin(char const*)

Removed

IRefitter::getDynamicRangeMax(char const*)

Removed

IRefitter::getTensorsWithDynamicRange()

Removed

IRuntime::deserializeCudaEngine(IStreamReader&)

IRuntime::deserializeCudaEngine(IStreamReaderV2&)

ITensor::setType(DataType)

Strong typing (type determined by network construction)

ITensor::setDynamicRange(float, float)

Explicit quantization with Q/DQ nodes

ITensor::dynamicRangeIsSet()

Removed

ITensor::resetDynamicRange()

Removed

ITensor::getDynamicRangeMin()

Removed

ITensor::getDynamicRangeMax()

Removed

ITensor::setBroadcastAcrossBatch(bool)

Removed (implicit batch not supported)

ITensor::getBroadcastAcrossBatch()

Removed (implicit batch not supported)

TacticSource::kCUBLAS

Removed

TacticSource::kCUBLAS_LT

Removed

TacticSource::kCUDNN

Removed

DetectionOutputParameters

Removed

NMSParameters

Removed

CodeTypeSSD

Removed

Removed C++ Plugins and Replacements#

Warning

The plugins listed below have been removed in TensorRT 11.x. Using them will cause compilation or linker errors. Review each entry for its replacement before upgrading.

BatchedNMS_TRT

Use INetworkDefinition::addNMS()

BatchedNMSDynamic_TRT

Use INetworkDefinition::addNMS()

BatchTilePlugin_TRT

Implement with standard TensorRT layers

Clip_TRT

Use INetworkDefinition::addActivation() with kCLIP

CoordConvAC

Implement with standard TensorRT layers (concatenate coordinate channels with IConcatenationLayer, then apply convolution)

CustomGeluPluginDynamic

Use INetworkDefinition::addActivation() with kGELU_ERF or kGELU_TANH

EfficientNMS_ONNX_TRT

Use INetworkDefinition::addNMS()

LReLU_TRT

Use INetworkDefinition::addActivation() with kLEAKY_RELU

NMS_TRT

Use INetworkDefinition::addNMS()

NMSDynamic_TRT

Use INetworkDefinition::addNMS()

Normalize_TRT

Use INetworkDefinition::addNormalizationV2()

Proposal

Implement with standard TensorRT layers

SingleStepLSTMPlugin

Use INetworkDefinition::addLoop() or standard RNN decomposition

SpecialSlice_TRT

Use INetworkDefinition::addSlice()

Split

Use INetworkDefinition::addSlice()

Deprecated BERT Plugins#

The following OSS BERT plugin classes are deprecated in 11.0.0 and scheduled for removal in a future release. Migrate to the listed replacements before upgrading beyond 11.x.

bertQKVToContextPlugin / CustomQKVToContextPluginDynamic

Refer to Migrate to IAttention for more information.