Is this page helpful?

NVIDIA TensorRT Documentation#

NVIDIA TensorRT is an SDK for optimizing and accelerating deep learning inference on NVIDIA GPUs. It takes trained models from frameworks such as PyTorch, TensorFlow, and ONNX, and optimizes them for high-performance deployment with support for mixed precision (FP32/FP16/BF16/FP8/INT8/FP4/INT4), dynamic shapes, and specialized optimizations for transformers and large language models (LLMs).

Quick Start#

🆕 New to NVIDIA TensorRT? → Start with Build Your First Engine to build and run your first optimized inference engine in about 10 minutes
📖 Ready for the full workflow menu? → After your first engine, use the Quick Start Guide for PyTorch and ONNX export paths, multiple runtimes, dynamic shapes, and quantization
📦 Install TensorRT → Start with Installation Guide Overview and Installing TensorRT
🐍 Python only (pip)? → Follow Method 1: Python Package Index (pip) (pip install tensorrt)
🛠️ C++ or CLI workflows? → Choose Debian/RPM, tar/zip, or container on Installing TensorRT; run trtexec from the package bin directory
⬆️ Upgrading from 11.0 or earlier? → Refer to What’s New in 11.1.0 below
🔄 Upgrading from TensorRT 10.x? → Use the NVIDIA TensorRT Migration Guide to plan your API and builder changes
🔧 Need help with a specific task? → Jump to the Inference Library Overview for API walkthroughs, dynamic shapes, quantization, and more, or the Troubleshooting section
⚡ Optimize inference performance → Best Practices

🆕 What’s New in NVIDIA TensorRT 11.1.0#

Release Highlights

CUDA 13.3 dependency upgrade: Updated CUDA Toolkit baseline across Linux x86-64, Windows x64, and SBSA platforms
Ubuntu 26.04 support: Adds Ubuntu 26.04 LTS to the supported Linux x86-64 and SBSA platform lists alongside the existing Ubuntu 22.04/24.04 packages
Python 3.14 bindings: Extends the Python wheel matrix to Python 3.14 on supported platforms
NVFP4 dual-GEMM fusion for SM121: Fuses the gate and up projection GEMMs in NVFP4 MoE/MLP blocks on NVIDIA DGX Spark (compute capability 12.1)
Global Performance Tuner: Automates trtexec build-route search to explore internal builder knobs, benchmark candidate engines, and optionally validate accuracy before selecting the fastest valid route. Refer to Global Performance Tuning.

View 11.1.0 Release Notes

Previous Releases#

Note

For complete version history and detailed changelogs, visit the Release Notes section or the TensorRT GitHub Releases.