NVIDIA TensorRT Documentation#
NVIDIA TensorRT is an SDK for optimizing and accelerating deep learning inference on NVIDIA GPUs. It takes trained models from frameworks such as PyTorch, TensorFlow, and ONNX, and optimizes them for high-performance deployment with support for mixed precision (FP32/FP16/BF16/FP8/INT8/FP4/INT4), dynamic shapes, and specialized optimizations for transformers and large language models (LLMs).
Quick Start#
π New to NVIDIA TensorRT? β Start with Build Your First Engine to build and run your first optimized inference engine in about 10 minutes
π Ready for the full workflow menu? β After your first engine, use the Quick Start Guide for PyTorch and ONNX export paths, multiple runtimes, dynamic shapes, and quantization
π¦ Install TensorRT β Start with Installation Guide Overview and Installing TensorRT
π Python only (pip)? β Follow Method 1: Python Package Index (pip) (
pip install tensorrt)π οΈ C++ or CLI workflows? β Choose Debian/RPM, tar/zip, or container on Installing TensorRT; run
trtexecfrom the packagebindirectoryβ¬οΈ Upgrading from 11.0 or earlier? β Refer to Whatβs New in 11.1.0 below
π Upgrading from TensorRT 10.x? β Use the NVIDIA TensorRT Migration Guide to plan your API and builder changes
π§ Need help with a specific task? β Jump to the Inference Library Overview for API walkthroughs, dynamic shapes, quantization, and more, or the Troubleshooting section
β‘ Optimize inference performance β Best Practices
π Whatβs New in NVIDIA TensorRT 11.1.0#
Release Highlights
CUDA 13.3 dependency upgrade: Updated CUDA Toolkit baseline across Linux x86-64, Windows x64, and SBSA platforms
Ubuntu 26.04 support: Adds Ubuntu 26.04 LTS to the supported Linux x86-64 and SBSA platform lists alongside the existing Ubuntu 22.04/24.04 packages
Python 3.14 bindings: Extends the Python wheel matrix to Python 3.14 on supported platforms
NVFP4 dual-GEMM fusion for SM121: Fuses the gate and up projection GEMMs in NVFP4 MoE/MLP blocks on NVIDIA DGX Spark (compute capability 12.1)
Global Performance Tuner: Automates
trtexecbuild-route search to explore internal builder knobs, benchmark candidate engines, and optionally validate accuracy before selecting the fastest valid route. Refer to Global Performance Tuning.
Previous Releases#
π Release 11.0.0 Highlights
Strongly typed networks are now the default: Weak-typing APIs (
setPrecision,setDynamicRange, the per-precisionBuilderFlagfamily) and implicit quantization (IInt8Calibrator) have been removed. Use the NVIDIA TensorRT Migration Guide to plan your upgradeIPluginV2 has been removed: The entire
IPluginV2family is gone; migrate custom plugins toIPluginV3withaddPluginV3(). Refer to the V2 β V3 walkthrough for a side-by-side API mappingMulti-Device Inference is generally available: Preview flag retired, plus new
AllToAll,Gather, andScattercollective ops, automatic NCCL library fallback, and a new context-parallel attention sample. Refer to Multi-Device InferenceRagged batching for attention:
IAttentionandIKVCacheUpdateLayernow support packed (kPACKED_NHD) layouts so variable-length sequences can be concatenated end-to-end without padding. Refer to Fused AttentionMoE inference performance: Significant Blackwell (SM10x/SM110) backend improvements close the gap to specialized external MoE kernels; the previous βkeep
seqLenβ€ 16β guidance no longer applies. Refer to MoE (Mixture of Experts)Rewritten Best Practices and Benchmarking guide: Reframed as a measure-then-optimize loop with side-by-side ONNX-TRT (
trtexec) and Torch-TRT workflows in synchronized tabs covering quantization, dynamic shapes, CUDA graphs, profiling, and Nsight Systems timeline reading. Refer to Performance BenchmarkingPlatform updates: RHEL 10 / Rocky Linux 10 RPM and tar packages, and a new TensorRT 10.x to 11.x migration path with dedicated DriveOS and Jetson/JetPack chapters
π¦ Archived Releases
Earlier TensorRT releases with key highlights:
TensorRT 10.x.x Releases - Release Notes and documentation for TensorRT 10.x.x
π Legacy Versions
TensorRT 8.6.1 Release (GitHub) and (Documentation)
Note
For complete version history and detailed changelogs, visit the Release Notes section or the TensorRT GitHub Releases.