Is this page helpful?

TensorRT 8.x/10.x Migration Guide#

This section documents API changes between TensorRT 8.x and TensorRT 10.x safety runtimes. TensorRT 10.x safety runtime support will be available in an upcoming DriveOS 7.2 release.

If you are unfamiliar with these changes, refer to our sample code for clarification.

Python#

Python API Changes#

Note

These Python migrations are not applicable on QNX, where the Python API is not supported.

Allocating Buffers and Using a Name-Based Engine API

TensorRT 8.x

def allocate_buffers(self, engine):
'''
Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
'''
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()

# binding is the name of input/output
for binding in the engine:
    size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
    dtype = trt.nptype(engine.get_binding_dtype(binding))

    # Allocate host and device buffers
    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
    device_mem = cuda.mem_alloc(host_mem.nbytes)

    # Append the device buffer address to device bindings.
    # When cast to int, it's a linear index into the context's memory (like memory address).
    bindings.append(int(device_mem))

    # Append to the appropriate input/output list.
    if engine.binding_is_input(binding):
        inputs.append(self.HostDeviceMem(host_mem, device_mem))
    else:
        outputs.append(self.HostDeviceMem(host_mem, device_mem))

return inputs, outputs, bindings, stream

TensorRT 10.x

def allocate_buffers(self, engine):
'''
Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
'''
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()

for i in range(engine.num_io_tensors):
    tensor_name = engine.get_tensor_name(i)
    size = trt.volume(engine.get_tensor_shape(tensor_name))
    dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))

    # Allocate host and device buffers
    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
    device_mem = cuda.mem_alloc(host_mem.nbytes)

    # Append the device buffer address to device bindings.
    # When cast to int, it's a linear index into the context's memory (like memory address).
    bindings.append(int(device_mem))

    # Append to the appropriate input/output list.
    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
        inputs.append(self.HostDeviceMem(host_mem, device_mem))
    else:
        outputs.append(self.HostDeviceMem(host_mem, device_mem))

return inputs, outputs, bindings, stream

Transition from enqueueV2 to enqueueV3 for Python

TensorRT 8.x

# Allocate device memory for inputs.
d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]

# Allocate device memory for outputs.
h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
d_output = cuda.mem_alloc(h_output.nbytes)

# Transfer data from host to device.
cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
cuda.memcpy_htod_async(d_inputs[2], input_c, stream)

# Run inference
context.execute_async_v2(bindings=[int(d_inp) for d_inp in d_inputs] + [int(d_output)], stream_handle=stream.handle)

# Synchronize the stream
stream.synchronize()

TensorRT 10.x

# Allocate device memory for inputs.
d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]

# Allocate device memory for outputs.
h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
d_output = cuda.mem_alloc(h_output.nbytes)

# Transfer data from host to device.
cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
cuda.memcpy_htod_async(d_inputs[2], input_c, stream)

# Setup tensor address
bindings = [int(d_inputs[i]) for i in range(3)] + [int(d_output)]

for i in range(engine.num_io_tensors):
    context.set_tensor_address(engine.get_tensor_name(i), bindings[i])

# Run inference
context.execute_async_v3(stream_handle=stream.handle)

# Synchronize the stream
stream.synchronize()

Engine Building, use only build_serialized_network

TensorRT 8.x

engine_bytes = None
try:
    engine_bytes = self.builder.build_serialized_network(self.network, self.config)
except AttributeError:
    engine = self.builder.build_engine(self.network, self.config)
    engine_bytes = engine.serialize()
    del engine
assert engine_bytes

TensorRT 10.x

engine_bytes = self.builder.build_serialized_network(self.network, self.config)
if engine_bytes is None:
    log.error("Failed to create engine")
    sys.exit(1)

Added Python APIs#

Types

APILanguage
ExecutionContextAllocationStrategy
IGpuAsyncAllocator
InterfaceInfo
IPluginResource
IPluginV3
IStreamReader
IVersionedInterface

Methods and Properties

ICudaEngine.is_debug_tensor()
ICudaEngine.minimum_weight_streaming_budget
ICudaEngine.streamable_weights_size
ICudaEngine.weight_streaming_budget
IExecutionContext.get_debug_listener()
IExecutionContext.get_debug_state()
IExecutionContext.set_all_tensors_debug_state()
IExecutionContext.set_debug_listener()
IExecutionContext.set_tensor_debug_state()
IExecutionContext.update_device_memory_size_for_shapes()
IGpuAllocator.allocate_async()
IGpuAllocator.deallocate_async()
INetworkDefinition.add_plugin_v3()
INetworkDefinition.is_debug_tensor()
INetworkDefinition.mark_debug()
INetworkDefinition.unmark_debug()
IPluginRegistry.acquire_plugin_resource()
IPluginRegistry.all_creators
IPluginRegistry.deregister_creator()
IPluginRegistry.get_creator()
IPluginRegistry.register_creator()
IPluginRegistry.release_plugin_resource()

Removed Python APIs#

The following removed Python APIs are listed next to their superseded API.

BuilderFlag.ENABLE_TACTIC_HEURISTIC > Builder optimization level 2
BuilderFlag.STRICT_TYPES > Use all three flags: BuilderFlag.DIRECT_IO, BuilderFlag.PREFER_PRECISION_CONSTRAINTS, and BuilderFlag.REJECT_EMPTY_ALGORITHMS
EngineCapability.DEFAULT > EngineCapability.STANDARD
EngineCapability.kSAFE_DLA > EngineCapability.DLA_STANDALONE
EngineCapability.SAFE_GPU > EngineCapability.SAFETY
IAlgorithmIOInfo.tensor_format > The strides, data type, and vectorization information are sufficient to identify tensor formats uniquely.
IBuilder.max_batch_size > Implicit batch support was removed
IBuilderConfig.max_workspace_size > IBuilderConfig.set_memory_pool_limit() with MemoryPoolType.WORKSPACE or IBuilderConfig.get_memory_pool_limit() with MemoryPoolType.WORKSPACE
IBuilderConfig.min_timing_iterations > IBuilderConfig.avg_timing_iterations
ICudaEngine.binding_is_input() > ICudaEngine.get_tensor_mode()
ICudaEngine.get_binding_bytes_per_component() > ICudaEngine.get_tensor_bytes_per_component()
ICudaEngine.get_binding_components_per_element() > ICudaEngine.get_tensor_components_per_element()
ICudaEngine.get_binding_dtype() > ICudaEngine.get_tensor_dtype()
ICudaEngine.get_binding_format() > ICudaEngine.get_tensor_format()
ICudaEngine.get_binding_format_desc() > ICudaEngine.get_tensor_format_desc()
ICudaEngine.get_binding_index() > No name-based equivalent replacement
ICudaEngine.get_binding_name() > No name-based equivalent replacement
ICudaEngine.get_binding_shape() > ICudaEngine.get_tensor_shape()
ICudaEngine.get_binding_vectorized_dim() > ICudaEngine.get_tensor_vectorized_dim()
ICudaEngine.get_location() > ITensor.location
ICudaEngine.get_profile_shape() > ICudaEngine.get_tensor_profile_shape()
ICudaEngine.get_profile_shape_input() > ICudaEngine.get_tensor_profile_values()
ICudaEngine.has_implicit_batch_dimension() > Implicit batch is no longer supported
ICudaEngine.is_execution_binding() > No name-based equivalent replacement
ICudaEngine.is_shape_binding() > ICudaEngine.is_shape_inference_io()
ICudaEngine.max_batch_size() > Implicit batch is no longer supported
ICudaEngine.num_bindings() > ICudaEngine.num_io_tensors()
IExecutionContext.get_binding_shape() > IExecutionContext.get_tensor_shape()
IExecutionContext.get_strides() > IExecutionContext.get_tensor_strides()
IExecutionContext.set_binding_shape() > IExecutionContext.set_input_shape()
IFullyConnectedLayer > IMatrixMultiplyLayer
INetworkDefinition.add_convolution() > INetworkDefinition.add_convolution_nd()
INetworkDefinition.add_deconvolution() > INetworkDefinition.add_deconvolution_nd()
INetworkDefinition.add_fully_connected() > INetworkDefinition.add_matrix_multiply()
INetworkDefinition.add_padding() > INetworkDefinition.add_padding_nd()
INetworkDefinition.add_pooling() > INetworkDefinition.add_pooling_nd()
INetworkDefinition.add_rnn_v2() > INetworkDefinition.add_loop()
INetworkDefinition.has_explicit_precision > Explicit precision support was removed in 10.0
INetworkDefinition.has_implicit_batch_dimension > - Implicit batch support was removed
IRNNv2Layer > ILoop
NetworkDefinitionCreationFlag.EXPLICIT_BATCH > Support was removed in 10.0
NetworkDefinitionCreationFlag.EXPLICIT_PRECISION > Support was removed in 10.0
PaddingMode.CAFFE_ROUND_DOWN > Caffe support was removed
PaddingMode.CAFFE_ROUND_UP > Caffe support was removed
PreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805 > External tactics are always disabled for core code
PreviewFeature.FASTER_DYNAMIC_SHAPES_0805 > This flag is on by default
ProfilingVerbosity.DEFAULT > ProfilingVerbosity.LAYER_NAMES_ONLY
ProfilingVerbosity.VERBOSE > ProfilingVerbosity.DETAILED
ResizeMode > Use InterpolationMode. Alias was removed.
SampleMode.DEFAULT > SampleMode.STRICT_BOUNDS
SliceMode > Use SampleMode. Alias was removed.

C++#

C++ API Changes#

Transition from enqueueV2 to enqueueV3 for C++

TensorRT 8.x

// Create RAII buffer manager object.
samplesCommon::BufferManager buffers(mEngine);

auto context = SampleUniquePtr<nvinfer1::IExecutionContext>(mEngine->createExecutionContext());
if (!context)
{
    return false;
}

// Pick a random digit to try to infer.
srand(time(NULL));
int32_t const digit = rand() % 10;

// Read the input data into the managed buffers.
// There should be just 1 input tensor.
ASSERT(mParams.inputTensorNames.size() == 1);

if (!processInput(buffers, mParams.inputTensorNames[0], digit))
{
    return false;
}
// Create a CUDA stream to execute this inference.
cudaStream_t stream;
CHECK(cudaStreamCreate(&stream));

// Asynchronously copy data from host input buffers to device input
buffers.copyInputToDeviceAsync(stream);

// Asynchronously enqueue the inference work
if (!context->enqueueV2(buffers.getDeviceBindings().data(), stream, nullptr))
{
    return false;
}
// Asynchronously copy data from device output buffers to host output buffers.
buffers.copyOutputToHostAsync(stream);

// Wait for the work in the stream to complete.
CHECK(cudaStreamSynchronize(stream));

// Release stream.
CHECK(cudaStreamDestroy(stream));

TensorRT 10.x

// Create RAII buffer manager object.
samplesCommon::BufferManager buffers(mEngine);

auto context = SampleUniquePtr<nvinfer1::IExecutionContext>(mEngine->createExecutionContext());
if (!context)
{
    return false;
}

for (int32_t i = 0, e = mEngine->getNbIOTensors(); i < e; i++)
{
    auto const name = mEngine->getIOTensorName(i);
    context->setTensorAddress(name, buffers.getDeviceBuffer(name));
}

// Pick a random digit to try to infer.
srand(time(NULL));
int32_t const digit = rand() % 10;

// Read the input data into the managed buffers.
// There should be just 1 input tensor.
ASSERT(mParams.inputTensorNames.size() == 1);

if (!processInput(buffers, mParams.inputTensorNames[0], digit))
{
    return false;
}
// Create a CUDA stream to execute this inference.
cudaStream_t stream;
CHECK(cudaStreamCreate(&stream));

// Asynchronously copy data from host input buffers to device input
buffers.copyInputToDeviceAsync(stream);

// Asynchronously enqueue the inference work
if (!context->enqueueV3(stream))
{
    return false;
}

// Asynchronously copy data from device output buffers to host output buffers.
buffers.copyOutputToHostAsync(stream);

// Wait for the work in the stream to complete.
CHECK(cudaStreamSynchronize(stream));

// Release stream.
CHECK(cudaStreamDestroy(stream));

64-Bit Dimension Changes#

The dimensions held by Dims changed from int32_t to int64_t. However, in TensorRT 10.x, TensorRT will generally reject networks that use dimensions exceeding the range of int32_t. The tensor type returned by IShapeLayer is now DataType::kINT64. Use ICastLayer to cast the result to the tensor of type DataType::kINT32 if 32-bit dimensions are required.

Inspect code that bitwise copies to and from Dims to ensure it is correct for int64_t dimensions.

Added C++ APIs#

Enums

ActivationType::kGELU_ERF
ActivationType::kGELU_TANH
BuilderFlag::kREFIT_IDENTICAL
BuilderFlag::kSTRIP_PLAN
BuilderFlag::kWEIGHT_STREAMING
BuilderFlag::kSTRICT_NANS
Datatype::kINT4
LayerType::kPLUGIN_V3

Types

APILanguage
Dims64
ExecutionContextAllocationStrategy
IGpuAsyncAllocator
InterfaceInfo
IPluginResource
IPluginV3
IStreamReader
IVersionedInterface

Methods and Properties

getInferLibBuildVersion
getInferLibMajorVersion
getInferLibMinorVersion
getInferLibPatchVersion
IBuilderConfig::setMaxNbTactics
IBuilderConfig::getMaxNbTactics
ICudaEngine::createRefitter
IcudaEngine::getMinimumWeightStreamingBudget
IcudaEngine::getStreamableWeightsSize
ICudaEngine::getWeightStreamingBudget
IcudaEngine::isDebugTensor
ICudaEngine::setWeightStreamingBudget
IExecutionContext::getDebugListener
IExecutionContext::getTensorDebugState
IExecutionContext::setAllTensorsDebugState
IExecutionContext::setDebugListener
IExecutionContext::setOuputTensorAddress
IExecutionContext::setTensorDebugState
IExecutionContext::updateDeviceMemorySizeForShapes
IGpuAllocator::allocateAsync
IGpuAllocator::deallocateAsync
INetworkDefinition::addPluginV3
INetworkDefinition::isDebugTensor
INetworkDefinition::markDebug
INetworkDefinition::unmarkDebug
IPluginRegistry::acquirePluginResource
IPluginRegistry::deregisterCreator
IPluginRegistry::getAllCreators
IPluginRegistry::getCreator
IPluginRegistry::registerCreator
IPluginRegistry::releasePluginResource

Removed C++ APIs#

The following removed C++ APIs are listed next to their superseded API.

BuilderFlag::kENABLE_TACTIC_HEURISTIC > Builder optimization level 2
BuilderFlag::kSTRICT_TYPES [1] > Use all three flags: kREJECT_EMPTY_ALGORITHMS, kDIRECT_IO, and kPREFER_PRECISION_CONSTRAINTS
EngineCapability::kDEFAULT > EngineCapability::kSTANDARD
EngineCapability::kSAFE_DLA > EngineCapability::kDLA_STANDALONE
EngineCapability::kSAFE_GPU > EngineCapability::kSAFETY
IAlgorithm::getAlgorithmIOInfo() > IAlgorithm::getAlgorithmIOInfoByIndex()
IAlgorithmIOInfo::getTensorFormat() > The strides, data type, and vectorization information are sufficient to identify tensor formats uniquely.
IBuilder::buildEngineWithConfig() > IBuilder::buildSerializedNetwork()
IBuilder::destroy() > delete ObjectName
IBuilder::getMaxBatchSize() > Implicit batch support was removed
IBuilder::setMaxBatchSize() > Implicit batch support was removed
IBuilderConfig::destroy() > delete ObjectName
IBuilderConfig::getMaxWorkspaceSize() > IBuilderConfig::getMemoryPoolLimit() with MemoryPoolType::kWORKSPACE
IBuilderConfig::getMinTimingIterations() > IBuilderConfig::getAvgTimingIterations()
IBuilderConfig::setMaxWorkspaceSize() > IBuilderConfig::setMemoryPoolLimit() with MemoryPoolType::kWORKSPACE
IBuilderConfig::setMinTimingIterations() > IBuilderConfig::setAvgTimingIterations()
IConvolutionLayer::getDilation() > IConvolutionLayer::getDilationNd()
IConvolutionLayer::getKernelSize() > IConvolutionLayer::getKernelSizeNd()
IConvolutionLayer::getPadding() > IConvolutionLayer::getPaddingNd()
IConvolutionLayer::getStride() > IConvolutionLayer::getStrideNd()
IConvolutionLayer::setDilation() > IConvolutionLayer::setDilationNd()
IConvolutionLayer::setKernelSize() > IConvolutionLayer::setKernelSizeNd()
IConvolutionLayer::setPadding() > IConvolutionLayer::setPaddingNd()
IConvolutionLayer::setStride() > IConvolutionLayer::setStrideNd()
ICudaEngine::bindingIsInput() > ICudaEngine::getTensorIOMode()
ICudaEngine::destroy() > delete ObjectName
ICudaEngine::getBindingBytesPerComponent() > ICudaEngine::getTensorBytesPerComponent()
ICudaEngine::getBindingComponentsPerElement() > ICudaEngine::getTensorComponentsPerElement()
ICudaEngine::getBindingDataType() > ICudaEngine::getTensorDataType()
ICudaEngine::getBindingDimensions() > ICudaEngine::getTensorShape()
ICudaEngine::getBindingFormat() > ICudaEngine::getTensorFormat()
ICudaEngine::getBindingFormatDesc() > ICudaEngine::getTensorFormatDesc()
ICudaEngine::getBindingIndex() > Name-based methods
ICudaEngine::getBindingName() > Name-based methods
ICudaEngine::getBindingVectorizedDim() > ICudaEngine::getTensorVectorizedDim()
ICudaEngine::getLocation() > ITensor::getLocation()
ICudaEngine::getMaxBatchSize() > Implicit batch support was removed
ICudaEngine::getNbBindings() > ICudaEngine::getNbIOTensors()
ICudaEngine::getProfileDimensions() > ICudaEngine::getProfileShape()
ICudaEngine::getProfileShapeValues() > ICudaEngine::getShapeValues()
ICudaEngine::hasImplicitBatchDimension() > Implicit batch support was removed
ICudaEngine::isExecutionBinding() > No name-based equivalent replacement
ICudaEngine::isShapeBinding() > ICudaEngine::isShapeInferenceIO()
IDeconvolutionLayer::getKernelSize() > IDeconvolutionLayer::getKernelSizeNd()
IDeconvolutionLayer::getPadding() > IDeconvolutionLayer::getPaddingNd()
IDeconvolutionLayer::getStride() > IDeconvolutionLayer::getStrideNd()
IDeconvolutionLayer::setKernelSize() > IDeconvolutionLayer::setKernelSizeNd()
IDeconvolutionLayer::setPadding() > IDeconvolutionLayer::setPaddingNd()
IDeconvolutionLayer::setStride() > IDeconvolutionLayer::setStrideNd()
IExecutionContext::destroy() > delete ObjectName
IExecutionContext::enqueue() > IExecutionContext::enqueueV3()
IExecutionContext::enqueueV2() > IExecutionContext::enqueueV3()
IExecutionContext::execute() > IExecutionContext::executeV2()
IExecutionContext::getBindingDimensions() > IExecutionContext::getTensorShape()
IExecutionContext::getShapeBinding() > IExecutionContext::getTensorAddress() or getOutputTensorAddress()
IExecutionContext::getStrides() > IExecutionContext::getTensorStrides()
IExecutionContext::setBindingDimensions() > IExecutionContext::setInputShape()
IExecutionContext::setInputShapeBinding() > IExecutionContext::setInputTensorAddress() or setTensorAddress()
IExecutionContext::setOptimizationProfile() > IExecutionContext::setOptimizationProfileAsync()
IFullyConnectedLayer > IMatrixMultiplyLayer
IGpuAllocator::free() > IGpuAllocator::deallocate()
IHostMemory::destroy() > delete ObjectName
INetworkDefinition::addConvolution() > INetworkDefinition::addConvolutionNd()
INetworkDefinition::addDeconvolution() > INetworkDefinition::addDeconvolutionNd()
INetworkDefinition::addFullyConnected() > INetworkDefinition::addMatrixMultiply()
INetworkDefinition::addPadding() > INetworkDefinition::addPaddingNd()
INetworkDefinition::addPooling() > INetworkDefinition::addPoolingNd()
INetworkDefinition::addRNNv2() > INetworkDefinition::addLoop()
INetworkDefinition::destroy() > delete ObjectName
INetworkDefinition::hasExplicitPrecision() > Explicit precision support was removed in 10.0
INetworkDefinition::hasImplicitBatchDimension() > Implicit batch support was removed
IOnnxConfig::destroy() > delete ObjectName
IPaddingLayer::getPostPadding() > IPaddingLayer::getPostPaddingNd()
IPaddingLayer::getPrePadding() > IPaddingLayer::getPrePaddingNd()
IPaddingLayer::setPostPadding() > IPaddingLayer::setPostPaddingNd()
IPaddingLayer::setPrePadding() > IPaddingLayer::setPrePaddingNd()
IPoolingLayer::getPadding() > IPoolingLayer::getPaddingNd()
IPoolingLayer::getStride() > IPoolingLayer::getStrideNd()
IPoolingLayer::getWindowSize() > IPoolingLayer::getWindowSizeNd()
IPoolingLayer::setPadding() > IPoolingLayer::setPaddingNd()
IPoolingLayer::setStride() > IPoolingLayer::setStrideNd()
IPoolingLayer::setWindowSize() > IPoolingLayer::setWindowSizeNd()
IRefitter::destroy() > delete ObjectName
IResizeLayer::getAlignCorners() > IResizeLayer::getAlignCornersNd()
IResizeLayer::setAlignCorners() > IResizeLayer::setAlignCornersNd()
IRuntime::deserializeCudaEngine(void const* blob, std::size_t size, IPluginFactory* pluginFactory) > Use deserializeCudaEngine with two parameters
IRuntime::destroy() > delete ObjectName
IRNNv2Layer > ILoop
kNV_TENSORRT_VERSION_IMPL [2] > define NV_TENSORRT_VERSION_INT(major, minor, patch) ((major) *10000L + (minor) *100L + (patch) *1L)
NetworkDefinitionCreationFlag::kEXPLICIT_BATCH > Support was removed in 10.0
NetworkDefinitionCreationFlag::kEXPLICIT_PRECISION > Support was removed in 10.0
NV_TENSORRT_SONAME_MAJOR > NV_TENSORRT_MAJOR
NV_TENSORRT_SONAME_MINOR > NV_TENSORRT_MINOR
NV_TENSORRT_SONAME_PATCH > NV_TENSORRT_PATCH
nvinfer1::safe::IPluginRegistry* getBuilderSafePluginRegistry(nvinfer1::EngineCapability capability) > API will not be implemented
PaddingMode::kCAFFE_ROUND_DOWN > Caffe support was removed
PaddingMode::kCAFFE_ROUND_UP > Caffe support was removed
PreviewFeature::kDISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805 > External tactics are always disabled for core code
PreviewFeature::kFASTER_DYNAMIC_SHAPES_080 > This flag is on by default
ProfilingVerbosity::kDEFAULT > ProfilingVerbosity::kLAYER_NAMES_ONLY
ProfilingVerbosity::kVERBOSE > ProfilingVerbosity::kDETAILED
ResizeMode > Use InterpolationMode. Alias was removed.
RNNDirection > RNN-related data structures were removed
RNNGateType > RNN-related data structures were removed
RNNInputMode > RNN-related data structures were removed
RNNOperation > RNN-related data structures were removed
SampleMode::kDEFAULT > SampleMode::kSTRICT_BOUNDS
SliceMode > Use SampleMode. Alias was removed.

Removed C++ Plugins#

The following removed C++ plugins are listed next to their superseded plugin.

createAnchorGeneratorPlugin() > GridAnchorPluginCreator::createPlugin()
createBatchedNMSPlugin() > BatchedNMSPluginCreator::createPlugin()
createInstanceNormalizationPlugin() > InstanceNormalizationPluginCreator::createPlugin()
createNMSPlugin() > NMSPluginCreator::createPlugin()
createNormalizePlugin() > NormalizePluginCreator::createPlugin()
createPriorBoxPlugin() > PriorBoxPluginCreator::createPlugin()
createRegionPlugin() > RegionPluginCreator::createPlugin()
createReorgPlugin() > ReorgPluginCreator::createPlugin()
createRPNROIPlugin() > RPROIPluginCreator::createPlugin()
createSplitPlugin() > INetworkDefinition::addSlice()
struct Quadruple > Related plugins were removed

`trtexec`#

`trtexec` Flag Changes#

Changes to flag workspace and minTiming.

TensorRT 8.x

trtexec \
    --onnx=/path/to/model.onnx \
    --saveEngine=/path/to/engine.trt \
    --optShapes=input:$INPUT_SHAPE \

    --workspace=1024 \
    --minTiming=1

TensorRT 10.x

trtexec \
    --onnx=/path/to/model.onnx \
    --saveEngine=/path/to/engine.trt \
    --optShapes=input:$INPUT_SHAPE \

    --memPoolSize=workspace:1024

Removed `trtexec` Flags#

The following removed trtexec` flags are listed next to their superseded flag.

--deploy > TensorRT 10.x does not support Caffe input, UFF input, and implicit batch dimension mode.
--output > TensorRT 10.x does not support Caffe input, UFF input, and implicit batch dimension mode.
--model > TensorRT 10.x does not support Caffe input, UFF input, and implicit batch dimension mode.
--uff > TensorRT 10.x does not support Caffe input, UFF input, and implicit batch dimension mode.
--uffInput > TensorRT 10.x does not support Caffe input, UFF input, and implicit batch dimension mode.
--uffNHWC > TensorRT 10.x does not support Caffe input, UFF input, and implicit batch dimension mode.
--batch > TensorRT 10.x does not support Caffe input, UFF input, and implicit batch dimension mode.
--maxBatch > TensorRT 10.x does not support Caffe input, UFF input, and implicit batch dimension mode.
--minTiming > --avgTiming
--preview=features > disableExternalTacticSourcesForCore0805 or fasterDynamicShapes0805
--workspace=N > --memPoolSize=poolspec
--explicitPrecision > Removed
--nativeInstanceNorm > Removed
--heuristic > --builderOptimizationLevel=<N> (where <N> can be 0, 1, or 2)
--buildOnly > --skipInference
--nvtxMode > --profilingVerbosity

Deprecated `trtexec` Flags#

The following deprecated trtexec` flags are listed next to their superseded flag.

--sparsity=force > Use polygraphy surgeon prune to rewrite the weights to a sparsity pattern and then run --sparsity=enable.
--plugins > --staticPlugins
--preview=profileSharing0806 > Enabled by default and has no effect.
--profilingVerbosity=default > --profilingVerbosity=layer_names_only
--profilingVerbosity=verbose > --profilingVerbosity=detailed
--streams > --infStreams
--weightless > --stripWeights

TensorRT 8.x/10.x Migration Guide#

Python#

Python API Changes#

Added Python APIs#

Removed Python APIs#

C++#

C++ API Changes#

64-Bit Dimension Changes#

Added C++ APIs#

Removed C++ APIs#

Removed C++ Plugins#

`trtexec`#

`trtexec` Flag Changes#

Removed `trtexec` Flags#

Deprecated `trtexec` Flags#

Safety Runtime#

Removed Safety C++ APIs#

TensorRT 8.x/10.x Migration Guide#

Python#

Python API Changes#

Added Python APIs#

Removed Python APIs#

C++#

C++ API Changes#

64-Bit Dimension Changes#

Added C++ APIs#

Removed C++ APIs#

Removed C++ Plugins#

trtexec#

trtexec Flag Changes#

Removed trtexec Flags#

Deprecated trtexec Flags#

Safety Runtime#

Removed Safety C++ APIs#

`trtexec`#

`trtexec` Flag Changes#

Removed `trtexec` Flags#

Deprecated `trtexec` Flags#