NVIDIA Deep Learning TensorRT Documentation
-
-
Last updated August 5, 2024
NVIDIA TensorRT
- Quick Start Guide
- This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
- Installation Guide
- This NVIDIA TensorRT 10.3.0 Installation Guide provides the installation requirements, a list of what is included in the TensorRT package, and step-by-step instructions for installing TensorRT.
- Release Notes
- NVIDIA TensorRT is a C++ library that facilitates high performance inference on NVIDIA GPUs. It is designed to work in connection with deep learning frameworks that are commonly used for training. TensorRT focuses specifically on running an already trained network quickly and efficiently on a GPU for the purpose of generating a result; also known as inferencing. These release notes describe the key features, software enhancements and improvements, and known issues for the TensorRT 10.3.0 product package.
- Support Matrix
- These support matrices provide an overview of the supported platforms, features, and hardware capabilities of the TensorRT APIs, parsers, and layers.
Inference Library
- Developer Guide
- This TensorRT Developer Guide demonstrates using C++ and Python APIs to implement the most common deep learning layers. It shows how you can take an existing model built with a deep learning framework and build a TensorRT engine using the provided parsers. The Developer Guide also provides step-by-step instructions for common user tasks such as creating a TensorRT network definition, invoking the TensorRT builder, serializing and deserializing, and feeding the engine with data and performing inference, all while using the C++ or Python API.
- Migration Guide
- This document highlights the TensorRT API modifications. If you are unfamiliar with these changes, refer to our sample code for clarification.
- Sample Support Guide
- This Samples Support Guide provides an overview of all the supported NVIDIA TensorRT 10.3.0 samples included on GitHub and in the product package. The TensorRT samples specifically help in areas such as recommenders, machine comprehension, character recognition, image classification, and object detection.
Optimized Frameworks
- Container Release Notes
- The TensorRT container is an easy to use container for TensorRT development. The container allows for the TensorRT samples to be built, modified, and executed. These release notes provide a list of key features, packaged software included in the container, software enhancements and improvements, and known issues. The TensorRT container is released monthly to provide you with the latest NVIDIA deep learning software libraries and GitHub code contributions that have been sent upstream. The libraries and contributions have all been tested, tuned, and optimized.
API Documentation
- C++ API
- This is the API documentation for the NVIDIA TensorRT library. The NVIDIA TensorRT C++ API allows developers to import, calibrate, generate and deploy networks using C++. Networks can be imported directly from ONNX. They may also be created programmatically by instantiating individual layers and setting parameters and weights directly.
- Python API
- The NVIDIA TensorRT Python API enables developers in Python based development environments and those looking to experiment with TensorRT to easily parse models (for example, from ONNX) and generate and run PLAN files.
- ONNX GraphSurgeon API
- ONNX GraphSurgeon provides a convenient way to create and modify ONNX models.
- Polygraphy API
- Polygraphy is a toolkit designed to assist in running and debugging deep learning models in various frameworks.
References
- Operators Documentation
- In TensorRT, operators represent distinct flavors of mathematical and programmatic operations. The following sections describe every operator that TensorRT supports. The minimum workspace required by TensorRT depends on the operators used by the network. A suggested minimum build-time setting is 16 MB. Regardless of the maximum workspace value provided to the builder, TensorRT will allocate at runtime no more than the workspace it requires.
- PyTorch-Quantization Toolkit User Guide
- PyTorch-Quantization is a toolkit for training and evaluating PyTorch models with simulated quantization. Quantization can be added to the model automatically, or manually, allowing the model to be tuned for accuracy and performance. The quantized model can be exported to ONNX and imported to an upcoming version of TensorRT.
- TensorFlow Quantization Toolkit User Guide
- TensorFlow Quantization Toolkit provides a simple API to quantize a given Keras model. Initially, the network is trained on the target dataset until fully converged. The quantization step consists of inserting Q/DQ nodes in the pretrained network to simulate quantization during training. The network is then retrained for a few epochs to recover accuracy in a step called fine-tuning.
Licenses
- LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS
- This document is the LICENSE AGREEMENT FOR NVIDIA SOFTWARE DEVELOPMENT KITS for NVIDIA TensorRT. This document contains specific license terms and conditions for NVIDIA TensorRT. By accepting this agreement, you agree to comply with all the terms and conditions applicable to the specific product(s) included herein.
Archives
- Documentation Archives
- This Archives document provides access to previously released NVIDIA TensorRT documentation versions.