TensorRT 10.16.1 Release Notes#

These Release Notes apply to x86 Linux and Windows users, ARM-based CPU cores for Server Base System Architecture (SBSA) users on Linux, and JetPack users. This release includes several fixes from the previous TensorRT releases and additional changes.

Announcements#

TensorRT 11.0 is coming soon with powerful new capabilities designed to accelerate your AI inference workflows:

  • Enhanced Developer Experience: Improved ease of use and seamless integration with PyTorch and Hugging Face ecosystems

  • Optimized for High-Growth Workloads: Stronger performance alignment across edge, automotive, and data center deployments

  • Modernized API: To streamline development, TensorRT 11.0 will remove legacy APIs including Weakly-typed APIs, Implicit INT8 quantization, IPluginV2, and TREX

    Action Required: We recommend migrating early for Strongly Typed Networks, Explicit Quantization, IPluginV3, and Nsight Deep Learning Designer.

JetPack support for Orin iGPUs

Support has been added in this TensorRT release for Orin iGPU while using the ARM SBSA build. This release will be bundled in an upcoming JetPack 7.x release, but it’s being made available as a separate download for users who want earlier access. Orin DLA is not supported at this time, but support is planned for a future release.

Safety headers now included in standard releases

The TensorRT functional safety headers are now included in all standard TensorRT packages. These headers are intended for use in ISO 26262-compliant functional safety applications. If you are not developing safety-certified software, no action is required - these headers can be safely ignored.

Key Features and Enhancements#

Interactive Sample Explorer

A new interactive Sample Explorer provides filtered, searchable access to all TensorRT samples. Browse samples by difficulty, language, or use case to find the right starting point for your project.

Interactive Support Matrix

A new Interactive Support Matrix provides filterable access to TensorRT 10.x compatibility information. Use three explorers for system requirements, hardware capabilities, and feature support across releases.

API Enhancements

API Change Tracking: To view API changes between releases, refer to the TensorRT GitHub repository and use the compare tool.

Breaking ABI Changes#

  • The object files, and therefore the symbols, inside the static library libonnx_proto.a have been merged into the libnvonnxparser_static.a static library. A symlink has been created for backward compatibility. Migrate your build to use the dynamic library libnvonnxparser.so. The static library libonnx_proto.a as well as libnvonnxparser_static.a will be removed in TensorRT 11.0.

  • The TensorRT Windows library files, with extension *.dll, were previously located under the lib subdirectory within the TensorRT zip package. These files are now located under the bin subdirectory within the TensorRT zip package, which is a more common packaging schema for Windows.

Compatibility#

For comprehensive platform compatibility, hardware requirements, and feature availability information, refer to the TensorRT Support Matrix. The support matrix provides detailed information about supported operating systems, CUDA versions, GPU architectures, precision modes, compiler requirements, and ONNX operator support for TensorRT 10.16.1.

Limitations#

  • The high-precision weights used in FP4 double quantization are not refittable.

  • Python samples do not support Python 3.13. Only the 3.13 Python bindings are currently supported.

  • Loops with scan outputs (ILoopOutputLayer with LoopOutput property being either LoopOutput::kCONCATENATE or LoopOutput::kREVERSE) must have the number of iterations set, that is, must have an ITripLimitLayer with TripLimit::kCOUNT. This requirement has always been present, but is now explicitly enforced instead of quietly having undefined behavior.

  • ISelectLayer must have data inputs (thenInput and elseInput) of the same datatype.

  • When implementing a custom layer using IPluginV3 plugin class where the custom layer has data-dependent shape (DDS), the size tensors must be of only INT64 type and not INT32 type, as the latter would result in a compilation failure. Related samples have been updated accordingly.

  • There are no optimized FP8 Convolutions for Group Convolutions; therefore, INT8 is still recommended for ConvNets containing these convolution ops.

  • Shuffle-op cannot be transformed to no-op for perf improvement in some cases. For the NCHW32 format, TensorRT takes the third-to-last dimension as the channel dimension. When a Shuffle-op is added like [N, ‘C’, H, 1] -> [‘N’, C, H], the channel dimension changes to N, then this op cannot be transformed to no-op.

  • When running a FP32 model in FP16 or BF16 WeaklyTyped mode on Blackwell GPUs, if the FP32 weights values are used by FP16 kernels, TensorRT does not clip the weights to [fp16_lowest, fp16_max] or [bf16_lowest, bf16_max] to avoid overflow like inf values. If you see inf graph outputs on Blackwell GPUs only, check if any FP32 weights cannot be represented by either FP16 or BF16, and update the weights.

  • The FP8 Convolutions on GPUs with SM89/90/120/121 do not support kernel sizes larger than 32 (for example, 7x7 convolutions); FP16 or FP32 fallback kernels will be used with suboptimal performance. Therefore, do not add FP8 Q/DQ ops before Convolutions with large kernel sizes for better performance.

  • When building the nonZeroPlugin sample on Windows, you might need to modify the CUDA version specified in the BuildCustomizations paths in the vcxproj file to match the installed version of CUDA.

Deprecated API Lifetime#

  • APIs deprecated in TensorRT 10.16 will be retained until 3/2027.

  • APIs deprecated in TensorRT 10.15 will be retained until 1/2027.

  • APIs deprecated in TensorRT 10.14 will be retained until 10/2026.

  • APIs deprecated in TensorRT 10.13 will be retained until 8/2026.

  • APIs deprecated in TensorRT 10.12 will be retained until 6/2026.

  • APIs deprecated in TensorRT 10.11 will be retained until 5/2026.

  • APIs deprecated in TensorRT 10.10 will be retained until 4/2026.

Deprecated and Removed Features#

  • For a complete list of deprecated C++ APIs, refer to the C++ API Deprecated List.

  • The TensorRT static libraries are deprecated on Linux starting with TensorRT 10.11. If you are using the static libraries for building your application, migrate to building your application with the shared libraries. The following library files will be removed in TensorRT 11.0.

    • libnvinfer_static.a

    • libnvinfer_plugin_static.a

    • libnvinfer_lean_static.a

    • libnvinfer_dispatch_static.a

    • libnvinfer_vc_plugin_static.a

    • libnvonnxparser_static.a

    • libonnx_proto.a

Fixed Issues#

  • Valgrind showed Invalid read of size 8 when calling cuMemcpyDtoHAsync_v2 while using CUDA 13.0 on edge Blackwell devices. This issue has been fixed in CUDA 13.1.

  • The CUDA driver library libcuda.so.1 was required at TensorRT load time on SBSA systems when it was expected to be a runtime dependency. This dependency has been removed and libcuda.so.1 is no longer required at load time.

Known Issues#

Functional

  • The ECCommunicatorAPITests.SetCommunicatorFailsWithoutSupportedLayer and ECCommunicatorAPITests.SetCommunicatorSucceedsWithDistCollective runtime tests report errors under valgrind (host) and compute-sanitizer memcheck (GPU). Host-side memory leaks are caused by NCCL internal allocations during ncclCommInitRank and ncclCommSplit and reproduce on H100 and B100 platforms. GPU-side errors are NCCL kernel probing failures that occur on H100 (SM 90) when the sanitizer intercepts CUDA API errors before NCCL can handle them internally; B100 is not affected on the GPU side.

  • Inputs to the IRecurrenceLayer must always have the same shape. This means that ONNX models with loops whose recurrence inputs change shapes will be rejected.

  • On CUDA versions prior to 13.2, the compute-sanitizer initcheck tool may flag false positive Uninitialized __global__ memory read errors when running TensorRT applications on NVIDIA Hopper GPUs. These errors can be safely ignored.

  • For broadcasting elementwise layers running on DLA with GPU fallback enabled with one NxCxHxW input and one Nx1x1x1 input, there is a known accuracy issue if at least one of the inputs is consumed in kDLA_LINEAR format. It is recommended to explicitly set the input formats of such elementwise layers to different tensor formats.

  • inplace_add mini-sample of the quickly_deployable_plugins Python sample may produce incorrect outputs on Windows.

  • TensorRT may exit if inputs with invalid values are provided to the RoiAlign plugin (ROIAlign_TRT), especially if there is inconsistency in the indices specified in the batch_indices input and the actual batch size used.

  • The ONNX specification of the NonMaxSuppression operation requires the iou_threshold parameter to be in the range of [0.0-1.0]. However, TensorRT does not validate the value of the parameter; therefore, TensorRT will accept values outside of this range, in which case, the engine will continue executing as if the value was capped at either end of this range.

  • PluginV2 in a loop or conditional scope is not supported. Upgrade to the PluginV3 interface as a WAR. This will impact some TensorRT-LLM models with GEMM plugins in a conditional scope.

Performance

  • EfficientNet/RegNet has a ~5 - 10% performance regression on RTX PRO 6000 Blackwell platform.

  • A non-zero tilingOptimizationLevel might introduce engine build failures for some networks on L4 GPUs.

  • The kREFIT and kREFIT_IDENTICAL have performance regressions compared with non-refit engines where convolution layers are present within a branch or loop, and the precision is FP16/INT8. This issue will be addressed in future releases.