TensorRT 10.16.1 Release Notes#
These Release Notes apply to x86 Linux and Windows users, ARM-based CPU cores for Server Base System Architecture (SBSA) users on Linux, and JetPack users. This release includes several fixes from the previous TensorRT releases and additional changes.
Announcements#
TensorRT 11.0 is coming soon with powerful new capabilities designed to accelerate your AI inference workflows:
Enhanced Developer Experience: Improved ease of use and seamless integration with PyTorch and Hugging Face ecosystems
Optimized for High-Growth Workloads: Stronger performance alignment across edge, automotive, and data center deployments
Modernized API: To streamline development, TensorRT 11.0 will remove legacy APIs including Weakly-typed APIs, Implicit INT8 quantization, IPluginV2, and TREX
Action Required: We recommend migrating early for Strongly Typed Networks, Explicit Quantization, IPluginV3, and Nsight Deep Learning Designer.
JetPack support for Orin iGPUs
Support has been added in this TensorRT release for Orin iGPU while using the ARM SBSA build. This release will be bundled in an upcoming JetPack 7.x release, but it’s being made available as a separate download for users who want earlier access. Orin DLA is not supported at this time, but support is planned for a future release.
Safety headers now included in standard releases
The TensorRT functional safety headers are now included in all standard TensorRT packages. These headers are intended for use in ISO 26262-compliant functional safety applications. If you are not developing safety-certified software, no action is required - these headers can be safely ignored.
Key Features and Enhancements#
Interactive Sample Explorer
A new interactive Sample Explorer provides filtered, searchable access to all TensorRT samples. Browse samples by difficulty, language, or use case to find the right starting point for your project.
Interactive Support Matrix
A new Interactive Support Matrix provides filterable access to TensorRT 10.x compatibility information. Use three explorers for system requirements, hardware capabilities, and feature support across releases.
API Enhancements
API Change Tracking: To view API changes between releases, refer to the TensorRT GitHub repository and use the compare tool.
Breaking ABI Changes#
The object files, and therefore the symbols, inside the static library
libonnx_proto.ahave been merged into thelibnvonnxparser_static.astatic library. A symlink has been created for backward compatibility. Migrate your build to use the dynamic librarylibnvonnxparser.so. The static librarylibonnx_proto.aas well aslibnvonnxparser_static.awill be removed in TensorRT 11.0.The TensorRT Windows library files, with extension
*.dll, were previously located under thelibsubdirectory within the TensorRT zip package. These files are now located under thebinsubdirectory within the TensorRT zip package, which is a more common packaging schema for Windows.
Compatibility#
For comprehensive platform compatibility, hardware requirements, and feature availability information, refer to the TensorRT Support Matrix. The support matrix provides detailed information about supported operating systems, CUDA versions, GPU architectures, precision modes, compiler requirements, and ONNX operator support for TensorRT 10.16.1.
Limitations#
The high-precision weights used in FP4 double quantization are not refittable.
Python samples do not support Python 3.13. Only the 3.13 Python bindings are currently supported.
Loops with scan outputs (
ILoopOutputLayerwithLoopOutputproperty being eitherLoopOutput::kCONCATENATEorLoopOutput::kREVERSE) must have the number of iterations set, that is, must have anITripLimitLayerwithTripLimit::kCOUNT. This requirement has always been present, but is now explicitly enforced instead of quietly having undefined behavior.ISelectLayermust have data inputs (thenInputandelseInput) of the same datatype.When implementing a custom layer using
IPluginV3plugin class where the custom layer has data-dependent shape (DDS), the size tensors must be of onlyINT64type and notINT32type, as the latter would result in a compilation failure. Related samples have been updated accordingly.There are no optimized FP8 Convolutions for Group Convolutions; therefore, INT8 is still recommended for ConvNets containing these convolution ops.
Shuffle-op cannot be transformed to no-op for perf improvement in some cases. For the NCHW32 format, TensorRT takes the third-to-last dimension as the channel dimension. When a Shuffle-op is added like [N, ‘C’, H, 1] -> [‘N’, C, H], the channel dimension changes to N, then this op cannot be transformed to no-op.
When running a FP32 model in FP16 or BF16 WeaklyTyped mode on Blackwell GPUs, if the FP32 weights values are used by FP16 kernels, TensorRT does not clip the weights to
[fp16_lowest, fp16_max]or[bf16_lowest, bf16_max]to avoid overflow likeinfvalues. If you seeinfgraph outputs on Blackwell GPUs only, check if any FP32 weights cannot be represented by either FP16 or BF16, and update the weights.The FP8 Convolutions on GPUs with SM89/90/120/121 do not support kernel sizes larger than 32 (for example, 7x7 convolutions); FP16 or FP32 fallback kernels will be used with suboptimal performance. Therefore, do not add FP8 Q/DQ ops before Convolutions with large kernel sizes for better performance.
When building the
nonZeroPluginsample on Windows, you might need to modify the CUDA version specified in theBuildCustomizationspaths in thevcxprojfile to match the installed version of CUDA.
Deprecated API Lifetime#
APIs deprecated in TensorRT 10.16 will be retained until 3/2027.
APIs deprecated in TensorRT 10.15 will be retained until 1/2027.
APIs deprecated in TensorRT 10.14 will be retained until 10/2026.
APIs deprecated in TensorRT 10.13 will be retained until 8/2026.
APIs deprecated in TensorRT 10.12 will be retained until 6/2026.
APIs deprecated in TensorRT 10.11 will be retained until 5/2026.
APIs deprecated in TensorRT 10.10 will be retained until 4/2026.
Deprecated and Removed Features#
For a complete list of deprecated C++ APIs, refer to the C++ API Deprecated List.
The TensorRT static libraries are deprecated on Linux starting with TensorRT 10.11. If you are using the static libraries for building your application, migrate to building your application with the shared libraries. The following library files will be removed in TensorRT 11.0.
libnvinfer_static.alibnvinfer_plugin_static.alibnvinfer_lean_static.alibnvinfer_dispatch_static.alibnvinfer_vc_plugin_static.alibnvonnxparser_static.alibonnx_proto.a
Fixed Issues#
Valgrind showed
Invalid read of size 8when callingcuMemcpyDtoHAsync_v2while using CUDA 13.0 on edge Blackwell devices. This issue has been fixed in CUDA 13.1.The CUDA driver library
libcuda.so.1was required at TensorRT load time on SBSA systems when it was expected to be a runtime dependency. This dependency has been removed andlibcuda.so.1is no longer required at load time.
Known Issues#
Functional
The
ECCommunicatorAPITests.SetCommunicatorFailsWithoutSupportedLayerandECCommunicatorAPITests.SetCommunicatorSucceedsWithDistCollectiveruntime tests report errors under valgrind (host) and compute-sanitizer memcheck (GPU). Host-side memory leaks are caused by NCCL internal allocations duringncclCommInitRankandncclCommSplitand reproduce on H100 and B100 platforms. GPU-side errors are NCCL kernel probing failures that occur on H100 (SM 90) when the sanitizer intercepts CUDA API errors before NCCL can handle them internally; B100 is not affected on the GPU side.Inputs to the
IRecurrenceLayermust always have the same shape. This means that ONNX models with loops whose recurrence inputs change shapes will be rejected.On CUDA versions prior to 13.2, the compute-sanitizer
initchecktool may flag false positiveUninitialized __global__ memory readerrors when running TensorRT applications on NVIDIA Hopper GPUs. These errors can be safely ignored.For broadcasting elementwise layers running on DLA with GPU fallback enabled with one NxCxHxW input and one Nx1x1x1 input, there is a known accuracy issue if at least one of the inputs is consumed in
kDLA_LINEARformat. It is recommended to explicitly set the input formats of such elementwise layers to different tensor formats.inplace_addmini-sample of thequickly_deployable_pluginsPython sample may produce incorrect outputs on Windows.TensorRT may exit if inputs with invalid values are provided to the
RoiAlignplugin (ROIAlign_TRT), especially if there is inconsistency in the indices specified in thebatch_indicesinput and the actual batch size used.The ONNX specification of the
NonMaxSuppressionoperation requires theiou_thresholdparameter to be in the range of[0.0-1.0]. However, TensorRT does not validate the value of the parameter; therefore, TensorRT will accept values outside of this range, in which case, the engine will continue executing as if the value was capped at either end of this range.PluginV2 in a loop or conditional scope is not supported. Upgrade to the PluginV3 interface as a WAR. This will impact some TensorRT-LLM models with GEMM plugins in a conditional scope.
Performance
EfficientNet/RegNet has a ~5 - 10% performance regression on RTX PRO 6000 Blackwell platform.
A non-zero
tilingOptimizationLevelmight introduce engine build failures for some networks on L4 GPUs.The
kREFITandkREFIT_IDENTICALhave performance regressions compared with non-refit engines where convolution layers are present within a branch or loop, and the precision is FP16/INT8. This issue will be addressed in future releases.