Is this page helpful?

Migrating C++ Code from TensorRT 10.x to 11.x#

This page describes how to update C++ code when you migrate from TensorRT 10.x to 11.x: paired examples for strongly typed networks, explicit quantization, plugin migration, and updated runtime APIs, followed by lists of C++ APIs added and removed in 11.x.

Migrating from Weak Typing to Strong Typing#

TensorRT 11.x removes all precision-enabling builder flags such as BuilderFlag::kFP16 and BuilderFlag::kINT8. Use ModelOpt AutoCast to convert your ONNX model to mixed precision before building.

Before (TensorRT 10.x)#

 auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(logger));
 auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(0));
 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());

 // Weak typing: TensorRT automatically considers FP16 kernels
 config->setFlag(BuilderFlag::kFP16);

 auto parser = SampleUniquePtr<nvonnxparser::IParser>(
     nvonnxparser::createParser(*network, logger));
 parser->parseFromFile("model.onnx", static_cast<int>(nvinfer1::ILogger::Severity::kWARNING));

 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));

In TensorRT 11.x, BuilderFlag::kFP16 and all other precision-enabling builder flags have been removed. Use ModelOpt AutoCast to convert the ONNX model to mixed precision before building.

After (TensorRT 11.x)#

 // Step 1: Convert model to mixed precision offline using ModelOpt:
 //   python -m modelopt.onnx.autocast --onnx_path model.onnx

 // Step 2: Build with strongly typed network (always on in 11.x)
 auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(logger));
 auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(0));
 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());

 // No precision flags needed - the model itself specifies types

 auto parser = SampleUniquePtr<nvonnxparser::IParser>(
     nvonnxparser::createParser(*network, logger));
 parser->parseFromFile("model_fp16.onnx", static_cast<int>(nvinfer1::ILogger::Severity::kWARNING));

 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));

Summary of Changes#

Removed config->setFlag(BuilderFlag::kFP16) and all other precision flags (kINT8, kFP8, kBF16, kINT4, kFP4)
Added an offline preprocessing step using ModelOpt AutoCast to produce a mixed-precision ONNX model
No code changes needed for the build path itself beyond removing the flag

Migrating INT8 Calibration to Explicit Quantization#

TensorRT 11.x removes IInt8Calibrator and all its subclasses, along with setInt8Calibrator(). Use ModelOpt or manual Q/DQ nodes for explicit quantization instead.

Before (TensorRT 10.x)#

 class MyCalibrator : public nvinfer1::IInt8EntropyCalibrator2
 {
 public:
     int32_t getBatchSize() const noexcept override { return 1; }

     bool getBatch(void* bindings[], char const* names[], int32_t nbBindings) noexcept override
     {
         // Fill bindings with calibration data
         if (mCurrentBatch >= mNumBatches)
             return false;
         // ... copy data to GPU
         mCurrentBatch++;
         return true;
     }

     void const* readCalibrationCache(size_t& length) noexcept override { return nullptr; }
     void writeCalibrationCache(void const* ptr, size_t length) noexcept override {}

 private:
     int32_t mCurrentBatch{0};
     int32_t mNumBatches{100};
 };

 // Usage
 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
 config->setFlag(BuilderFlag::kINT8);

 MyCalibrator calibrator;
 config->setInt8Calibrator(&calibrator);

 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));

In TensorRT 11.x, IInt8Calibrator and all subclasses have been removed along with setInt8Calibrator(). Use ModelOpt or manual Q/DQ nodes.

After (TensorRT 11.x)#

 // Step 1: Quantize the model offline using ModelOpt:
 //   python -m modelopt.onnx.quantization --onnx_path model.onnx --calibration_data data.npz
 //
 // Alternatively, add QuantizeLinear/DequantizeLinear nodes to the ONNX graph manually,
 // or use the INetworkDefinition::addQuantize() and addDequantize() APIs.

 // Step 2: Build the pre-quantized model
 auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(logger));
 auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(0));
 auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());

 auto parser = SampleUniquePtr<nvonnxparser::IParser>(
     nvonnxparser::createParser(*network, logger));
 parser->parseFromFile("model_quantized.onnx",
     static_cast<int>(nvinfer1::ILogger::Severity::kWARNING));

 auto plan = SampleUniquePtr<IHostMemory>(builder->buildSerializedNetwork(*network, *config));

Summary of Changes#

Removed the IInt8Calibrator subclass entirely
Removed config->setFlag(BuilderFlag::kINT8) and config->setInt8Calibrator()
Quantization is applied to the model offline (Q/DQ nodes in the ONNX graph) or via the addQuantize() / addDequantize() network definition APIs

Migrating Plugins from `IPluginV2DynamicExt` to `IPluginV3`#

The following example shows a complete plugin migration from V2 to V3 using a NonZero plugin that computes the indices of non-zero elements. This demonstrates V3’s support for data-dependent output shapes, which was not possible with V2.

Before (TensorRT 10.x - `IPluginV2DynamicExt`)#

 class NonZeroPluginV2 : public nvinfer1::IPluginV2DynamicExt
 {
 public:
     // IPluginV2 core methods
     char const* getPluginType() const noexcept override { return "NonZeroPlugin"; }
     char const* getPluginVersion() const noexcept override { return "1"; }
     int32_t getNbOutputs() const noexcept override { return 1; }

     // Output dimensions - limited to expressions of input dimensions only
     DimsExprs getOutputDimensions(int32_t outputIndex, DimsExprs const* inputs,
         int32_t nbInputs, IExprBuilder& exprBuilder) noexcept override
     {
         // Cannot express data-dependent shapes - must use an upper bound
         DimsExprs output;
         output.nbDims = 2;
         output.d[0] = exprBuilder.operation(DimensionOperation::kPROD,
             *inputs[0].d[0], *inputs[0].d[1]); // Upper bound: R * C
         output.d[1] = exprBuilder.constant(2);
         return output;
     }

     bool supportsFormatCombination(int32_t pos, PluginTensorDesc const* inOut,
         int32_t nbInputs, int32_t nbOutputs) noexcept override
     {
         return inOut[pos].format == TensorFormat::kLINEAR;
     }

     void configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs,
         DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept override {}

     int32_t enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc,
         void const* const* inputs, void* const* outputs,
         void* workspace, cudaStream_t stream) noexcept override
     {
         // Execute kernel
         return 0;
     }

     size_t getWorkspaceSize(PluginTensorDesc const* inputs, int32_t nbInputs,
         PluginTensorDesc const* outputs, int32_t nbOutputs) const noexcept override
     {
         return 0;
     }

     // Serialization
     size_t getSerializationSize() const noexcept override { return sizeof(bool); }
     void serialize(void* buffer) const noexcept override
     {
         *reinterpret_cast<bool*>(buffer) = mRowOrder;
     }

     IPluginV2DynamicExt* clone() const noexcept override
     {
         return new NonZeroPluginV2(mRowOrder);
     }

     // ... other required IPluginV2 methods (destroy, setPluginNamespace, etc.)

 private:
     bool mRowOrder{true};
 };

 // V2 Plugin Creator
 class NonZeroCreatorV2 : public nvinfer1::IPluginCreator
 {
 public:
     char const* getPluginName() const noexcept override { return "NonZeroPlugin"; }
     char const* getPluginVersion() const noexcept override { return "1"; }
     PluginFieldCollection const* getFieldNames() noexcept override { return &mFC; }

     IPluginV2* createPlugin(char const* name, PluginFieldCollection const* fc) noexcept override
     {
         return new NonZeroPluginV2(/*rowOrder=*/true);
     }

     IPluginV2* deserializePlugin(char const* name, void const* data,
         size_t length) noexcept override
     {
         bool rowOrder = *reinterpret_cast<bool const*>(data);
         return new NonZeroPluginV2(rowOrder);
     }

     // ... other required methods

 private:
     PluginFieldCollection mFC{};
 };

 // Usage
 NonZeroPluginV2 plugin(/*rowOrder=*/true);
 auto* layer = network->addPluginV2(&inputTensor, 1, plugin);

After (TensorRT 11.x - `IPluginV3`)#

 class NonZeroPlugin : public IPluginV3, public IPluginV3OneCore,
                   public IPluginV3OneBuild, public IPluginV3OneRuntime
 {
 public:
     NonZeroPlugin(bool rowOrder) : mRowOrder(rowOrder) {}

     // IPluginV3 - return the appropriate capability interface
     IPluginCapability* getCapabilityInterface(PluginCapabilityType type) noexcept override
     {
         if (type == PluginCapabilityType::kBUILD)
             return static_cast<IPluginV3OneBuild*>(this);
         if (type == PluginCapabilityType::kRUNTIME)
             return static_cast<IPluginV3OneRuntime*>(this);
         return static_cast<IPluginV3OneCore*>(this);
     }

     // IPluginV3OneCore
     AsciiChar const* getPluginName() const noexcept override { return "NonZeroPlugin"; }
     AsciiChar const* getPluginVersion() const noexcept override { return "1"; }
     AsciiChar const* getPluginNamespace() const noexcept override { return ""; }

     // IPluginV3OneBuild
     int32_t getNbOutputs() const noexcept override { return 2; } // data + size tensor

     int32_t getOutputDataTypes(DataType* outputTypes, int32_t nbOutputs,
         DataType const* inputTypes, int32_t nbInputs) const noexcept override
     {
         outputTypes[0] = DataType::kINT32; // non-zero indices
         outputTypes[1] = DataType::kINT64; // size tensor
         return 0;
     }

     // Output shapes - V3 supports data-dependent shapes via declareSizeTensor
     int32_t getOutputShapes(DimsExprs const* inputs, int32_t nbInputs,
         DimsExprs const* shapeInputs, int32_t nbShapeInputs,
         DimsExprs* outputs, int32_t nbOutputs,
         IExprBuilder& exprBuilder) noexcept override
     {
         auto upperBound = exprBuilder.operation(DimensionOperation::kPROD,
             *inputs[0].d[0], *inputs[0].d[1]);
         auto optValue = exprBuilder.operation(DimensionOperation::kFLOOR_DIV,
             *upperBound, *exprBuilder.constant(2));

         // Declare a size tensor - enables data-dependent output shapes
         auto numNonZero = exprBuilder.declareSizeTensor(1, *optValue, *upperBound);

         outputs[0].nbDims = 2;
         outputs[0].d[0] = numNonZero; // Data-dependent dimension
         outputs[0].d[1] = exprBuilder.constant(2);

         outputs[1].nbDims = 0; // Size tensor is a scalar
         return 0;
     }

     bool supportsFormatCombination(int32_t pos, DynamicPluginTensorDesc const* inOut,
         int32_t nbInputs, int32_t nbOutputs) noexcept override
     {
         return inOut[pos].desc.format == TensorFormat::kLINEAR;
     }

     int32_t configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs,
         DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept override
     {
         return 0;
     }

     size_t getWorkspaceSize(DynamicPluginTensorDesc const* inputs, int32_t nbInputs,
         DynamicPluginTensorDesc const* outputs, int32_t nbOutputs) const noexcept override
     {
         return 0;
     }

     // IPluginV3OneRuntime
     int32_t enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc,
         void const* const* inputs, void* const* outputs,
         void* workspace, cudaStream_t stream) noexcept override
     {
         // Execute kernel - same as V2
         return 0;
     }

     int32_t onShapeChange(PluginTensorDesc const* in, int32_t nbInputs,
         PluginTensorDesc const* out, int32_t nbOutputs) noexcept override
     {
         return 0;
     }

     // Serialization - uses PluginFieldCollection instead of raw bytes
     PluginFieldCollection const* getFieldsToSerialize() noexcept override
     {
         mDataToSerialize.clear();
         mDataToSerialize.emplace_back("rowOrder", &mRowOrder, PluginFieldType::kINT32, 1);
         mFCToSerialize.nbFields = mDataToSerialize.size();
         mFCToSerialize.fields = mDataToSerialize.data();
         return &mFCToSerialize;
     }

     IPluginV3* attachToContext(IPluginResourceContext* context) noexcept override
     {
         return clone();
     }

     IPluginV3* clone() noexcept override
     {
         return new NonZeroPlugin(mRowOrder);
     }

 private:
     bool mRowOrder{true};
     std::vector<PluginField> mDataToSerialize;
     PluginFieldCollection mFCToSerialize{};
 };

 // V3 Plugin Creator
 class NonZeroCreator : public nvinfer1::IPluginCreatorV3One
 {
 public:
     NonZeroCreator()
     {
         mPluginAttributes.emplace_back("rowOrder", nullptr, PluginFieldType::kINT32, 1);
         mFC.nbFields = mPluginAttributes.size();
         mFC.fields = mPluginAttributes.data();
     }

     char const* getPluginName() const noexcept override { return "NonZeroPlugin"; }
     char const* getPluginVersion() const noexcept override { return "1"; }
     char const* getPluginNamespace() const noexcept override { return ""; }
     PluginFieldCollection const* getFieldNames() noexcept override { return &mFC; }

     // Phase-aware creation - no separate deserializePlugin needed
     IPluginV3* createPlugin(char const* name, PluginFieldCollection const* fc,
         TensorRTPhase phase) noexcept override
     {
         bool rowOrder = true;
         for (int32_t i = 0; i < fc->nbFields; ++i)
         {
             if (std::string_view(fc->fields[i].name) == "rowOrder")
                 rowOrder = *static_cast<bool const*>(fc->fields[i].data);
         }
         return new NonZeroPlugin(rowOrder);
     }

 private:
     PluginFieldCollection mFC{};
     std::vector<PluginField> mPluginAttributes;
 };

 // Usage - addPluginV3 accepts both data inputs and shape inputs
 NonZeroPlugin plugin(/*rowOrder=*/true);
 ITensor* inputs[] = {&inputTensor};
 auto* layer = network->addPluginV3(inputs, 1, nullptr, 0, plugin);

Summary of Changes#

Plugin class inherits from IPluginV3, IPluginV3OneCore, IPluginV3OneBuild, and IPluginV3OneRuntime instead of IPluginV2DynamicExt
Added getCapabilityInterface() to return the appropriate interface for each phase (core, build, runtime)
getOutputDimensions() replaced by getOutputShapes(), which supports data-dependent output shapes using exprBuilder.declareSizeTensor()
Added required getOutputDataTypes() method
serialize() / getSerializationSize() replaced by getFieldsToSerialize(), which returns a PluginFieldCollection for structured serialization
Added onShapeChange() and attachToContext() methods
Creator inherits from IPluginCreatorV3One instead of IPluginCreator; createPlugin() takes a TensorRTPhase parameter, and deserializePlugin() is no longer needed - createPlugin() handles both build and runtime phases
addPluginV2(inputs, nbInputs, plugin) replaced by addPluginV3(inputs, nbInputs, shapeInputs, nbShapeInputs, plugin)

Known Issues When Migrating Plugins#

Empty PluginField initializers can crash V3 dispatch. When a plugin advertises a PluginField with a nullptr data pointer and length == 0, the V3 creator dispatch path can dereference the pointer during build or deserialization. Populate every entry with a non-null sentinel buffer, even when the value is unused at runtime:
```
// Bad — empty initializer
mPluginAttributes.emplace_back("flag", nullptr, PluginFieldType::kINT32, 0);

// Good — non-null sentinel keeps the dispatch path safe
static int32_t kDummy = 0;
mPluginAttributes.emplace_back("flag", &kDummy, PluginFieldType::kINT32, 1);
```
Use strongly-typed networks with IPluginV3. Mixing IPluginV3 plugins with weakly-typed networks can hit fusion paths that were not exercised by IPluginV2DynamicExt and trigger crashes. In TensorRT 11.0.0 all precision-enabling builder flags (BuilderFlag::kFP16, kINT8, kBF16, kFP8, kINT4, kFP4) have been removed, so any network you build is strongly typed by default; no action required for fresh 11.x builds. Authors back-porting V3 plugins to a 10.x build for evaluation must explicitly opt in with createNetworkV2(NetworkDefinitionCreationFlag::kSTRONGLY_TYPED).

Migrating Weight Streaming APIs#

The weight streaming API has been updated in TensorRT 11.x. The getMinimumWeightStreamingBudget() method has been removed; compute a budget from getStreamableWeightsSize() and available device memory instead.

Before (TensorRT 10.x)#

 auto engine = SampleUniquePtr<ICudaEngine>(
 runtime->deserializeCudaEngine(engineData, engineSize));

 // Old API
 int64_t minBudget = engine->getMinimumWeightStreamingBudget();
 engine->setWeightStreamingBudget(minBudget);
 int64_t currentBudget = engine->getWeightStreamingBudget();

After (TensorRT 11.x)#

 auto engine = SampleUniquePtr<ICudaEngine>(
 runtime->deserializeCudaEngine(engineData, engineSize));

 // V2 API
 size_t freeMem, totalMem;
 cudaMemGetInfo(&freeMem, &totalMem);
 int64_t weightsSize = engine->getStreamableWeightsSize();
 int64_t budget = std::min(static_cast<int64_t>(freeMem / 2), weightsSize / 2);
 engine->setWeightStreamingBudgetV2(budget);
 int64_t currentBudget = engine->getWeightStreamingBudgetV2();

Summary of Changes#

setWeightStreamingBudget() replaced by setWeightStreamingBudgetV2()
getWeightStreamingBudget() replaced by getWeightStreamingBudgetV2()
getMinimumWeightStreamingBudget() removed - compute a budget using getStreamableWeightsSize() and available device memory

Migrating Memory Management APIs#

TensorRT 11.x replaces getDeviceMemorySize() with getDeviceMemorySizeV2() (which returns int64_t), removes createExecutionContextWithoutDeviceMemory(), and replaces setDeviceMemory(void*) with setDeviceMemoryV2(void*, int64_t).

Before (TensorRT 10.x)#

 auto engine = SampleUniquePtr<ICudaEngine>(
 runtime->deserializeCudaEngine(engineData, engineSize));

 // Old APIs
 size_t memSize = engine->getDeviceMemorySize();
 auto context = SampleUniquePtr<IExecutionContext>(
     engine->createExecutionContextWithoutDeviceMemory());

 void* deviceMem;
 cudaMalloc(&deviceMem, memSize);
 context->setDeviceMemory(deviceMem);

After (TensorRT 11.x)#

 auto engine = SampleUniquePtr<ICudaEngine>(
 runtime->deserializeCudaEngine(engineData, engineSize));

 // V2 APIs - int64_t sizes, explicit size parameter
 int64_t memSize = engine->getDeviceMemorySizeV2();
 auto context = SampleUniquePtr<IExecutionContext>(engine->createExecutionContext());

 void* deviceMem;
 cudaMalloc(&deviceMem, memSize);
 context->setDeviceMemoryV2(deviceMem, memSize);

Summary of Changes#

getDeviceMemorySize() replaced by getDeviceMemorySizeV2() (returns int64_t instead of size_t)
createExecutionContextWithoutDeviceMemory() removed - use createExecutionContext()
setDeviceMemory(void*) replaced by setDeviceMemoryV2(void*, int64_t), which takes an explicit size parameter

Removed C++ APIs and Replacements#

Warning

The APIs listed below have been removed in TensorRT 11.x and will cause compile-time errors if used. Review each entry for its replacement before upgrading.

BuilderFlag::kFP16: Strong typing with ModelOpt AutoCast
BuilderFlag::kINT8: Explicit quantization with Q/DQ nodes
BuilderFlag::kFP8: Explicit quantization with Q/DQ nodes
BuilderFlag::kBF16: Strong typing with ModelOpt AutoCast
BuilderFlag::kINT4: Explicit quantization with Q/DQ nodes
BuilderFlag::kFP4: Explicit quantization with Q/DQ nodes
BuilderFlag::kOBEY_PRECISION_CONSTRAINTS: Strong typing (always enforced)
BuilderFlag::kPREFER_PRECISION_CONSTRAINTS: Strong typing (always enforced)
BuilderFlag::kDIRECT_IO: Removed (not needed in 11.x)
IAlgorithm, IAlgorithmContext, IAlgorithmIOInfo, IAlgorithmSelector, IAlgorithmVariant: Use editable mode in ITimingCache instead.
IBuilderConfig::setInt8Calibrator(IInt8Calibrator*): Explicit quantization with Q/DQ nodes
IBuilderConfig::getInt8Calibrator(): Removed
IBuilderConfig::setCalibrationProfile(IOptimizationProfile const*): Removed
IBuilderConfig::getCalibrationProfile(): Removed
IBuilderConfig::setQuantizationFlags(QuantizationFlags): Removed
IBuilderConfig::getQuantizationFlags(): Removed
IBuilderConfig::clearQuantizationFlag(QuantizationFlag): Removed
IBuilderConfig::setQuantizationFlag(QuantizationFlag): Removed
IBuilderConfig::getQuantizationFlag(QuantizationFlag): Removed
ICudaEngine::createExecutionContextWithoutDeviceMemory(): ICudaEngine::createExecutionContext()
ICudaEngine::getDeviceMemorySize(): ICudaEngine::getDeviceMemorySizeV2()
ICudaEngine::getDeviceMemorySizeForProfile(int32_t): ICudaEngine::getDeviceMemorySizeForProfileV2(int32_t)
ICudaEngine::getMinimumWeightStreamingBudget(): Compute from getStreamableWeightsSize()
ICudaEngine::getProfileTensorValues(char const*, int32_t, OptProfileSelector): ICudaEngine::getProfileTensorValuesV2()
ICudaEngine::getWeightStreamingBudget(): ICudaEngine::getWeightStreamingBudgetV2()
ICudaEngine::hasImplicitBatchDimension(): Removed (always false)
ICudaEngine::setWeightStreamingBudget(int64_t): ICudaEngine::setWeightStreamingBudgetV2(int64_t)
IExecutionContext::allInputShapesSpecified(): Removed (always true)
IExecutionContext::setDeviceMemory(void*): IExecutionContext::setDeviceMemoryV2(void*, int64_t)
IGpuAllocator::allocate(uint64_t, uint64_t, AllocatorFlags): IGpuAllocator::allocateAsync(uint64_t, uint64_t, AllocatorFlags, cudaStream_t)
IGpuAllocator::deallocate(void*): IGpuAllocator::deallocateAsync(void*, cudaStream_t)
IInt8Calibrator (all subclasses): Explicit quantization with Q/DQ nodes
ILayer::setPrecision(DataType): Strong typing (set types on tensors directly)
ILayer::getPrecision(): Removed
ILayer::precisionIsSet(): Removed
ILayer::resetPrecision(): Removed
ILayer::setOutputType(int32_t, DataType): Strong typing (set types on tensors directly)
ILayer::outputTypeIsSet(int32_t): Removed
ILayer::resetOutputType(int32_t): Removed
INetworkDefinition::addAttention(..., bool): INetworkDefinition::addAttentionV2(..., CausalMaskKind)
INetworkDefinition::addNMS(ITensor&, ITensor&, ITensor&): INetworkDefinition::addNMS(..., DataType) (4-arg version)
INetworkDefinition::addNonZero(ITensor&): INetworkDefinition::addNonZero(ITensor&, DataType)
INetworkDefinition::addNormalization(...): INetworkDefinition::addNormalizationV2(...)
INetworkDefinition::addPluginV2(ITensor* const*, int32_t, IPluginV2&): INetworkDefinition::addPluginV3(...)
INetworkDefinition::addTopK(ITensor&, TopKOperation, int32_t, uint32_t): INetworkDefinition::addTopK(..., DataType) (5-arg version)
INormalizationLayer::setComputePrecision(DataType): Removed (use strong typing)
IOutputAllocator::reallocateOutput(char const*, void*, uint64_t, uint64_t): IOutputAllocator::reallocateOutputAsync(..., cudaStream_t)
IPluginCreator: IPluginCreatorV3One
IPluginRegistry::deregisterCreator(IPluginCreator const&): IPluginRegistry::deregisterCreator(IPluginCreatorInterface const&)
IPluginRegistry::getPluginCreator(...): IPluginRegistry::getCreator(...)
IPluginRegistry::getPluginCreatorList(int32_t*): IPluginRegistry::getAllCreators(int32_t*)
IPluginRegistry::registerCreator(IPluginCreator&, ...): IPluginRegistry::registerCreator(IPluginCreatorInterface&, ...)
IPluginV2DynamicExt: IPluginV3
IPluginV2Ext: IPluginV3
IPluginV2IOExt: IPluginV3
IPluginV2Layer: IPluginV3Layer
IRefitter::setDynamicRange(char const*, float, float): Explicit quantization with Q/DQ nodes
IRefitter::getDynamicRangeMin(char const*): Removed
IRefitter::getDynamicRangeMax(char const*): Removed
IRefitter::getTensorsWithDynamicRange(): Removed
IRuntime::deserializeCudaEngine(IStreamReader&): IRuntime::deserializeCudaEngine(IStreamReaderV2&)
ITensor::setType(DataType): Strong typing (type determined by network construction)
ITensor::setDynamicRange(float, float): Explicit quantization with Q/DQ nodes
ITensor::dynamicRangeIsSet(): Removed
ITensor::resetDynamicRange(): Removed
ITensor::getDynamicRangeMin(): Removed
ITensor::getDynamicRangeMax(): Removed
ITensor::setBroadcastAcrossBatch(bool): Removed (implicit batch not supported)
ITensor::getBroadcastAcrossBatch(): Removed (implicit batch not supported)
TacticSource::kCUBLAS: Removed
TacticSource::kCUBLAS_LT: Removed
TacticSource::kCUDNN: Removed
DetectionOutputParameters: Removed
NMSParameters: Removed
CodeTypeSSD: Removed

Removed C++ Plugins and Replacements#

Warning

The plugins listed below have been removed in TensorRT 11.x. Using them will cause compilation or linker errors. Review each entry for its replacement before upgrading.

BatchedNMS_TRT: Use INetworkDefinition::addNMS()
BatchedNMSDynamic_TRT: Use INetworkDefinition::addNMS()
BatchTilePlugin_TRT: Implement with standard TensorRT layers
Clip_TRT: Use INetworkDefinition::addActivation() with kCLIP
CoordConvAC: Implement with standard TensorRT layers (concatenate coordinate channels with IConcatenationLayer, then apply convolution)
CustomGeluPluginDynamic: Use INetworkDefinition::addActivation() with kGELU_ERF or kGELU_TANH
EfficientNMS_ONNX_TRT: Use INetworkDefinition::addNMS()
LReLU_TRT: Use INetworkDefinition::addActivation() with kLEAKY_RELU
NMS_TRT: Use INetworkDefinition::addNMS()
NMSDynamic_TRT: Use INetworkDefinition::addNMS()
Normalize_TRT: Use INetworkDefinition::addNormalizationV2()
Proposal: Implement with standard TensorRT layers
SingleStepLSTMPlugin: Use INetworkDefinition::addLoop() or standard RNN decomposition
SpecialSlice_TRT: Use INetworkDefinition::addSlice()
Split: Use INetworkDefinition::addSlice()

Deprecated BERT Plugins#

The following OSS BERT plugin classes are deprecated in 11.0.0 and scheduled for removal in a future release. Migrate to the listed replacements before upgrading beyond 11.x.

bertQKVToContextPlugin / CustomQKVToContextPluginDynamic: Refer to Migrate to IAttention for more information.

Migrating C++ Code from TensorRT 10.x to 11.x#

Migrating from Weak Typing to Strong Typing#

Before (TensorRT 10.x)#

After (TensorRT 11.x)#

Summary of Changes#

Migrating INT8 Calibration to Explicit Quantization#

Before (TensorRT 10.x)#

After (TensorRT 11.x)#

Summary of Changes#

Migrating Plugins from IPluginV2DynamicExt to IPluginV3#

Before (TensorRT 10.x - IPluginV2DynamicExt)#

After (TensorRT 11.x - IPluginV3)#

Summary of Changes#

Known Issues When Migrating Plugins#

Migrating Weight Streaming APIs#

Before (TensorRT 10.x)#

After (TensorRT 11.x)#

Summary of Changes#

Migrating Memory Management APIs#

Before (TensorRT 10.x)#

After (TensorRT 11.x)#

Summary of Changes#

Removed C++ APIs and Replacements#

Removed C++ Plugins and Replacements#

Deprecated BERT Plugins#

Migrating Plugins from `IPluginV2DynamicExt` to `IPluginV3`#

Before (TensorRT 10.x - `IPluginV2DynamicExt`)#

After (TensorRT 11.x - `IPluginV3`)#