NVIDIA TensorRT Documentation#
NVIDIA TensorRT is an SDK that facilitates high-performance machine learning inference. It complements training frameworks such as TensorFlow, PyTorch, and MXNet. TensorRT focuses on running an already-trained network quickly and efficiently on NVIDIA hardware.
Quick Start#
🆕 New to NVIDIA TensorRT? → Start with the Quick Start Guide to build and deploy your first optimized inference engine in 30–60 minutes
⬆️ Upgrading from 10.11 or earlier? → See What’s New in 10.12.0 below
🔧 Need help with a specific task? → Jump to the Installing TensorRT or Troubleshooting section
🆕 What’s New in NVIDIA TensorRT 10.12.0#
Latest Release Highlights
MXFP8 Quantization Support - Block quantization across 32 high-precision elements with E8M0 scaling factor for improved model compression
Enhanced Debug Tensor Feature - Mark all unfused tensors as debug tensors without preventing fusion, with support for NumPy, string, and raw data formats
Distributive Independence Determinism - Guarantee identical outputs across distributive axis when inputs are identical, improving reproducibility
Weak Typing APIs Deprecated - Migration to strong-typing exclusively; refer to Strong Typing vs Weak Typing guide for migration
Refactored Python Samples - New samples with cleaner structure:
1_run_onnx_with_tensorrtand2_construct_network_with_layer_apis
What You’ll Find Here#
🚀 Getting Started - Quick start guide, release notes, and platform support matrix
📦 Installing TensorRT - Installation requirements, prerequisites, and step-by-step setup instructions
🏗️ Architecture - TensorRT design overview, optimization capabilities, and how the inference engine works
🔧 Inference Library - C++ and Python APIs, code samples, and advanced features like quantization and dynamic shapes
⚡ Performance - Best practices for optimization and using trtexec for benchmarking
📚 API - Complete API references for C++, Python, ONNX GraphSurgeon, and Polygraphy tools
📖 Reference - Troubleshooting guides, operator support, command-line tools, and glossary
Previous Releases#
📦 Archived Releases
Earlier TensorRT 10.x releases with key highlights:
10.11.0 Release Notes - Condition-dependent shapes, large tensor support, static libraries deprecation
10.10.0 Release Notes - Enhanced large tensor handling, Blackwell GPU performance improvements
10.9.0 Release Notes - Same compute capability compatibility, AOT compilable Python plugins
10.8.0 Release Notes - Blackwell GPU support, E2M1 FP4 data type, tiling optimization
10.7.0 Release Notes - Nsight Deep Learning Designer support, engine deserialization API
10.6.0 Release Notes - Quickly Deployable Plugins (QDPs), FP8 MHA on Ada GPUs
10.5.0 Release Notes - Linux SBSA Python wheels, Volta support removed
10.4.0 Release Notes - Ubuntu 24.04 support, LLM build time improvements
10.3.0 Release Notes - Cross-platform engine support (experimental), FP8 convolution on Ada
10.2.0 Release Notes - FP8 convolution support, fine-grained refit control
10.1.0 Release Notes - Advanced weight streaming APIs, enhanced device memory management
10.0.1 Release Notes - Weight streaming, INT4 weight-only quantization, IPluginV3 framework
10.0.0 Early Access Release Notes - Initial TensorRT 10.x preview release
📦 Legacy Versions
Note
For complete version history and detailed changelogs, visit the Release Notes section or the TensorRT GitHub Releases.