Skip to main content
country_code
Ctrl+K
đŸ“¦ Archived Documentation – Reference Only — This documentation is retained for customers using legacy product versions. It is no longer actively maintained, validated, or updated, and should not be relied upon for current product capabilities, security guidance, or operational decisions. For supported and up-to-date documentation, visit the NVIDIA Docs Hub: NVIDIA TensorRT.
NVIDIA TensorRT - Home NVIDIA TensorRT - Home

NVIDIA TensorRT

  • Documentation Home
NVIDIA TensorRT - Home NVIDIA TensorRT - Home

NVIDIA TensorRT

  • Documentation Home

Table of Contents

Getting Started

  • Release Notes
    • 10.16.1 (Latest)
    • 10.16.0
    • 10.15.1
    • 10.14.1
    • 10.13.3
    • 10.13.2
    • 10.13.0
    • 10.12.0
    • 10.11.0
    • 10.10.0
    • 10.9.0
    • 10.8.0
    • 10.7.0
    • 10.6.0
    • 10.5.0
    • 10.4.0
    • 10.3.0
    • 10.2.0
    • 10.1.0
    • 10.0.1
    • 10.0.0 Early Access
  • Quick Start Guide
  • Support Matrix

Installing TensorRT

  • Installation Guide Overview
  • Prerequisites
  • Installing TensorRT
    • Method 1: Python Package Index (pip)
    • Method 2: Debian Package Installation
    • Method 3: RPM Package Installation
    • Method 4: Tar File Installation
    • Method 5: Zip File Installation (Windows)
    • Method 6: Container Installation
    • Alternative Installation Methods
  • Upgrading TensorRT
  • Uninstalling TensorRT

Architecture

  • Architecture Overview
  • TensorRT’s Capabilities
  • How TensorRT Works

Inference Library

  • C++ API Documentation
  • Python API Documentation
  • Sample Support Guide
  • Advanced Topics
    • Version Compatibility
    • Refitting an Engine
    • Algorithm Selection and Reproducible Builds
    • I/O Formats
    • Engine Inspector
    • Weight Streaming
    • Tiling Optimization
  • Working with Quantized Types
  • Accuracy Considerations
  • Working with Dynamic Shapes
    • Dynamic Shapes: Core Concepts
    • Dynamic Shapes: Advanced Topics
  • Extending TensorRT with Custom Layers
    • Adding Custom Layers Using the C++ API
    • Adding Custom Layers using the Python API (TensorRT >= 10.6)
    • Enabling Timing Caching and Using Custom Tactics
    • Plugin API Description
  • Working with Loops
  • Working with Conditionals
  • Working with DLA
  • TensorRT API Capture and Replay
  • Working with Transformers

Performance

  • Best Practices
    • Performance Benchmarking using trtexec
    • Advanced Performance Measurement Techniques
    • Hardware/Software Environment for Performance Measurements
    • Optimizing TensorRT Performance
    • Overhead of Shape Change and Optimization Profile Switching
    • Improving Model Accuracy
    • Optimizing Builder Performance

API

  • C++ API
  • Python API
  • Migrating from TensorRT 8.x to 10.x
    • Migrating Python Code from TensorRT 8.x to 10.x
    • Migrating C++ Code from TensorRT 8.x to 10.x
    • Migrating trtexec Usage from TensorRT 8.x to 10.x
    • Migrating Safety Runtime Code from TensorRT 8.x to 10.x
  • ONNX GraphSurgeon API
  • Polygraphy API

Reference

  • Troubleshooting
  • Data Format Descriptions
  • Command-Line Programs
  • Operators Documentation
  • Additional Resources
  • LICENSE AGREEMENT
  • Glossary
  • Advanced Topics
Is this page helpful?

Advanced Topics#

This section covers advanced TensorRT features and configuration options.

  • Version Compatibility
    • Manually Loading the Runtime
    • Loading from Storage
    • Using Version Compatibility with the ONNX Parser
    • NVIDIA Ampere GPU Architecture (and Later) Compatibility Level
    • Same Compute Capability Compatibility Level
  • Refitting an Engine
    • Weight-Stripping
    • Refitting a Weight-Stripped Engine Directly from ONNX
    • Weight-Stripping Work with Lean Runtime
    • Fine-Grained Refit Build
    • Stripping Weights with Fine-Grained Refit Build
  • Algorithm Selection and Reproducible Builds
    • Strongly Typed Networks
    • Reduced Precision in Weakly-Typed Networks
    • Control of Computational Precision
  • I/O Formats
    • Sparsity
    • Empty Tensors
    • Reusing Input Buffers
  • Engine Inspector
    • Optimizer Callbacks
    • Preview Features
    • Debug Tensors
  • Weight Streaming
    • Cross-Platform Compatibility
  • Tiling Optimization
    • Multi-Device Inference (Preview Feature)

previous

Sample Support Guide

next

Version Compatibility

NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2021-2026, NVIDIA Corporation.

Last updated on May 23, 2026.