NVIDIA TensorRT Documentation#

NVIDIA TensorRT is an SDK that facilitates high-performance machine learning inference. It complements training frameworks such as TensorFlow, PyTorch, and MXNet. TensorRT focuses on running an already-trained network quickly and efficiently on NVIDIA hardware.

Quick Start#

🆕 What’s New in NVIDIA TensorRT 10.12.0#

Latest Release Highlights

  • MXFP8 Quantization Support - Block quantization across 32 high-precision elements with E8M0 scaling factor for improved model compression

  • Enhanced Debug Tensor Feature - Mark all unfused tensors as debug tensors without preventing fusion, with support for NumPy, string, and raw data formats

  • Distributive Independence Determinism - Guarantee identical outputs across distributive axis when inputs are identical, improving reproducibility

  • Weak Typing APIs Deprecated - Migration to strong-typing exclusively; refer to Strong Typing vs Weak Typing guide for migration

  • Refactored Python Samples - New samples with cleaner structure: 1_run_onnx_with_tensorrt and 2_construct_network_with_layer_apis

View 10.12.0 Release Notes

What You’ll Find Here#

  • 🚀 Getting Started - Quick start guide, release notes, and platform support matrix

  • 📦 Installing TensorRT - Installation requirements, prerequisites, and step-by-step setup instructions

  • 🏗️ Architecture - TensorRT design overview, optimization capabilities, and how the inference engine works

  • 🔧 Inference Library - C++ and Python APIs, code samples, and advanced features like quantization and dynamic shapes

  • Performance - Best practices for optimization and using trtexec for benchmarking

  • 📚 API - Complete API references for C++, Python, ONNX GraphSurgeon, and Polygraphy tools

  • 📖 Reference - Troubleshooting guides, operator support, command-line tools, and glossary

Previous Releases#

📦 Archived Releases

Earlier TensorRT 10.x releases with key highlights:

📦 Legacy Versions

Note

For complete version history and detailed changelogs, visit the Release Notes section or the TensorRT GitHub Releases.