TensorRT for RTX Documentation#

TensorRT for RTX builds on the proven performance of the NVIDIA TensorRT inference library, and simplifies the deployment of AI models on NVIDIA RTX GPUs across desktops, laptops, and workstations.

TensorRT for RTX is a drop-in replacement for NVIDIA TensorRT in applications targeting NVIDIA RTX GPUs from Turing through Blackwell generations. It introduces a Just-In-Time (JIT) optimizer in the runtime that compiles improved inference engines directly on the end-user’s RTX-accelerated PC in under 30 seconds. This eliminates the need for lengthy pre-compilation steps and enables rapid engine generation, improved application portability, and cutting-edge inference performance.

To support integration into lightweight applications and deployment in memory-constrained environments, TensorRT for RTX is compact under 200 MB.

This makes real-time, responsive AI applications for image processing, speech synthesis, and generative AI practical and performant on consumer-grade devices.

NVIDIA RTX GeForce GPUs have more than a 100 million existing install base. If you are building AI apps for desktops, laptops or workstations targeting this large user-base using a Windows or a Linux machine, TensorRT for RTX library is your library of choice. You can either integrate into your apps using its native C++ or Python APIs or leverage it as the default execution path through the Windows ML interface.

Relation to Other TensorRT Ecosystem Libraries#

TensorRT for RTX optimizes CNN, Diffusion and Speech models, expressed in ONNX or native C++ APIs, for running on NVIDIA RTX GPUs. Unlike the NVIDIA TensorRT Inference library, this library does not support other NVIDIA GPU platforms like Datacenter, Edge, or Embedded.

TensorRT for RTX has a subset of APIs derived from TensorRT and shares the same namespace so that existing TensorRT applications for RTX devices can easily port to TensorRT for RTX just by linking to the new library. Detailed porting instructions are in the Porting section. Note that TensorRT for RTX APIs may diverge in the future.

TensorRT for RTX, as mentioned above, is the default execution provider in the Windows ML framework; its preview build can be downloaded from the Microsoft Store. Similarly, TensorRT for RTX is also available as an execution provider in ONNX-Runtime.

TensorRT for RTX works with quantized ONNX models exported by TensorRT Model Optimizer or any other 3rd party quantization library. Datatype support is detailed in the Support Matrix.

TensorRT for RTX does not support native deployments of LLMs, however there are plans to accelerate LLM pipelines using TensorRT for RTX through integration with other frameworks.

Note

TensorRT for RTX framework integrations with NVIDIA TensorRT-LLM, Torch-TensorRT, TensorFlow-TensorRT, and NVIDIA Triton Inference Server are currently unavailable.

TensorRT for RTX Documentation#

In the following documentation, TensorRT for RTX will be referred to as TensorRT-RTX.

Attention

Ensure you refer to the Release Notes, which describes the newest features, software enhancements and improvements, and known issues for the TensorRT-RTX release.

The Support Matrix provides an overview of the supported platforms, features, and hardware capabilities of the TensorRT-RTX APIs, parsers, and layers.
The Installing TensorRT-RTX section provides the installation requirements and step-by-step instructions for installing TensorRT-RTX.
The Inference Library section demonstrates how to use the C++ and Python APIs for implementing the most common deep learning layers. It shows how you can take an existing model built with a deep learning framework and build a TensorRT-RTX engine using the provided parsers.
The Performance section introduces how to use tensorrt_rtx, a command-line tool designed for TensorRT-RTX performance benchmarking, to get the inference performance measurements of your deep learning models.
The API section enables developers in C++ and Python based development environments and those looking to experiment with TensorRT-RTX to easily parse models (for example, from ONNX) and generate and run TensorRT-RTX engine files.
The Reference section links to the TensorRT-RTX Operators documentation, all cybersecurity disclosures, and licenses.