Is this page helpful?

NVIDIA TensorRT Documentation#

NVIDIA TensorRT is an SDK for optimizing and accelerating deep learning inference on NVIDIA GPUs. It takes trained models from frameworks such as PyTorch, TensorFlow, and ONNX, and optimizes them for high-performance deployment with support for mixed precision (FP32/FP16/BF16/FP8/INT8), dynamic shapes, and specialized optimizations for transformers and large language models (LLMs).

Quick Start#

🆕 New to NVIDIA TensorRT? → Start with the Quick Start Guide to build and deploy your first optimized inference engine in 30–60 minutes
⬆️ Upgrading from 10.16.0 or earlier? → Refer to What’s New in 10.16.1 below
🔧 Need help with a specific task? → Jump to the Installing TensorRT or Troubleshooting section

🆕 What’s New in NVIDIA TensorRT 10.16.1#

Latest Release Highlights

TensorRT 11.0 Coming Soon — New capabilities for PyTorch/Hugging Face integration, modernized APIs, removal of legacy weakly-typed APIs. Migrate early to Strongly Typed Networks, Explicit Quantization, and IPluginV3
JetPack Support for Orin iGPUs — Orin iGPU support via the ARM SBSA build, available as an early-access download ahead of JetPack 7.x
Safety Headers Included — Functional safety headers for ISO 26262-compliant applications are now included in all standard TensorRT packages
Interactive Sample Explorer — Browse all TensorRT samples by difficulty, language, or use case
Interactive Support Matrix — Filterable support matrix with three explorers for system requirements, hardware capabilities, and feature support

View 10.16.1 Release Notes

Previous Releases#

Note

For complete version history and detailed changelogs, visit the Release Notes section or the TensorRT GitHub Releases.