NVIDIA TensorRT Product Family

NVIDIA® TensorRT™ is a high-performance deep learning inference SDK that optimizes trained neural networks for deployment on NVIDIA GPUs. TensorRT transforms models from TensorFlow, PyTorch, ONNX, and other frameworks into optimized runtime engines that deliver low-latency, high-throughput inference across datacenter, cloud, edge, embedded, and consumer platforms.

The TensorRT family includes four products tailored for different deployment scenarios:

- TensorRT (Enterprise): Full-featured inference for datacenter, edge, and embedded systems
- TensorRT-LLM: Specialized toolkit for Large Language Model (LLM) inference optimization
- TensorRT-RTX: Optimized for consumer RTX GPUs in desktops, laptops, and workstations

Choose the TensorRT product that matches your deployment target and use case.

TensorRT (Enterprise)
TensorRT-LLM
TensorRT-RTX

TensorRT (Enterprise)

The comprehensive inference SDK for production AI deployments across datacenter, edge, and embedded platforms.

TensorRT delivers maximum performance for deep learning inference on NVIDIA datacenter GPUs (A100, H100, H200), edge devices (Jetson), and automotive platforms (DRIVE). It provides the complete TensorRT feature set with extensive model support, advanced optimizations, and enterprise-grade tooling.

NVIDIA TensorRT Documentation (Latest)

Browse

NVIDA TensorRT Release Notes

Browse

NVIDIA TensorRT Quick Start Guide

Browse

Installing NVIDIA TensorRT

Browse

TensorRT-LLM

Specialized toolkit for optimizing Large Language Model (LLM) inference with state-of-the-art performance on NVIDIA GPUs.

TensorRT-LLM provides a Python API to define LLMs and build TensorRT engines optimized specifically for LLM workloads. It includes pre-built implementations of popular open-source models, multi-GPU and multi-node support, in-flight batching, paged KV caching, and quantization techniques (FP8, INT8, INT4) to maximize LLM serving throughput and minimize latency.

TensorRT-LLM is the recommended solution for deploying LLMs in production at scale across datacenter and cloud environments.

NVIDIA TensorRT-LLM Documetnation (Latest)

Browse

NVIDIA TensorRT-LLM on GitHub

Browse

TensorRT for RTX

Optimized inference for NVIDIA RTX GPUs in consumer desktops, laptops, and workstations.

TensorRT for RTX targets the 100M+ install base of NVIDIA RTX GPUs (GeForce RTX 20, 30, 40, 50 series and professional RTX GPUs). It delivers a compact runtime (under 200 MB) with Just-In-Time (JIT) optimization that generates inference engines in under 30 seconds directly on end-user devices.

This approach eliminates lengthy pre-compilation, enables rapid engine generation, improves application portability across RTX GPU generations, and provides cutting-edge inference performance for consumer AI applications.

NVIDIA TensorRT for RTX Documentation (Latest)

Browse

NVIDIA TensorRT for RTX on GitHub

Browse