Is this page helpful?

Prerequisites#

Before installing TensorRT, ensure your system meets the following requirements. This page is organized by category to help you quickly find the information you need.

Quick Checklist:

✓ NVIDIA GPU (Turing architecture or later) ✓ CUDA Toolkit 12.x or 13.x installed ✓ Appropriate GPU drivers (r535+ on Linux, r537+ on Windows) ✓ Python 3.10-3.14 recommended (3.8-3.9 bindings available but samples not supported)

Before You Begin#

Review Release Information

Before installation, familiarize yourself with the NVIDIA TensorRT Release Notes to understand:

New features in this release
Known issues and limitations
Platform-specific considerations
Compatibility changes

Choose Your API

TensorRT provides both C++ and Python APIs:

C++ API - Full functionality, no Python dependency
Python API - Convenient for rapid prototyping and integration
Both - Most users install both (default)

The installation instructions assume you want both APIs. For C++ only installation, skip Python-specific packages.

Required Software#

NVIDIA CUDA Toolkit

TensorRT requires the NVIDIA CUDA Toolkit. If not already installed, refer to the NVIDIA CUDA Installation Guide.

Supported CUDA Versions:

CUDA 13.x

CUDA 12.x

Driver Requirements:

Linux: NVIDIA driver r535 or later
Windows: NVIDIA driver r537 or later
CUDA 13.x: NVIDIA driver r580 or later (both platforms)

For more information, refer to the TensorRT Support Matrix.

Optional Dependencies#

The following libraries are optional and only needed for specific use cases:

cuBLAS (Optional)

cuBLAS is optional and only used for a few specific layers.

When needed: If your model requires cuBLAS-accelerated layers
Installation: Refer to the NVIDIA cuBLAS website

CUDA-Python (Optional)

CUDA-Python enables direct CUDA kernel calls from Python.

When needed: If you use TensorRT Python API with custom CUDA operations
Installation: Refer to the NVIDIA CUDA-Python documentation

NCCL (Optional)

Required only for the multi-device inference feature.

When needed: When using IDistCollectiveLayer (SM 80+ / Ampere and later) or multi-device attention via IAttention::setNbRanks (SM 100+ / Blackwell and later)
Installation: Refer to the NVIDIA NCCL Installation Guide. The Deep Learning Framework Containers include a compatible NCCL build.
B300 platforms: When using multi-device inference on NVIDIA B300, use NCCL 2.30.x or later to avoid long cold-initialization latency on the first ncclCommInitRank call. Refer to the TensorRT 11.0.0 release notes (Known Issues) and TensorRT 11.1.0 release notes (Fixed Issues) for details.

For more information, refer to the TensorRT Support Matrix.

Framework and Model Support#

PyTorch Integration

If you plan to use TensorRT with PyTorch:

Tested with: PyTorch >= 2.0
Compatibility: May work with older versions
Use case: Examples and integration samples

ONNX Model Support

The ONNX-TensorRT parser supports:

ONNX version: 1.20.0
Opset support: Up to opset 25
Backward compatibility: Official support is provided for opset 9 and above

For more information, refer to the TensorRT Support Matrix.

TensorRT Installation Modes#

TensorRT offers three installation modes to suit different deployment scenarios:

Full Installation (Recommended for Development)

Includes: Complete builder and runtime functionality
Use for: Model development, optimization, and deployment
Size: Largest footprint (~2-3 GB)
Capabilities: Build engines, run inference, full API access

Lean Runtime (Recommended for Production)

Includes: Runtime-only functionality
Use for: Production deployment with pre-built engines
Size: Significantly smaller (~200-300 MB)
Capabilities: Run version-compatible engines
Limitations: Cannot build new engines

Dispatch Runtime (Recommended for Minimal Footprint)

Includes: Minimal runtime with lean runtime functionality
Use for: Memory-constrained deployments
Size: Smallest footprint (~100-150 MB)
Capabilities: Run version-compatible engines
Limitations: Cannot build new engines

Tip

Development to Production Workflow:

Use Full Installation during development to build and optimize models
Serialize optimized engines to plan files
Deploy with Lean Runtime or Dispatch Runtime in production

Next Steps#

After verifying prerequisites:

Proceed to Installing TensorRT to choose your installation method
Select the installation mode that fits your use case
Follow the step-by-step installation instructions