TensorRT Overview

The core of NVIDIA TensorRT is a C++ library that facilitates high performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network, which consists of a network definition and a set of trained parameters, and produces a highly optimized runtime engine which performs inference for that network.

You can describe a TensorRT network using a C++ or Python API, or you can import an existing Caffe, ONNX, or TensorFlow model using one of the provided parsers.

The TensorRT API includes import methods to help you express your trained deep learning models for TensorRT to optimize and run. TensorRT applies graph optimizations, layer fusion, and finds the fastest implementation of that model leveraging a diverse collection of highly optimized kernels, and a runtime that you can use to execute this network in an inference context.

TensorRT includes an infrastructure that allows you to leverage the high speed mixed precision capabilities of Pascal and Volta GPUs as an optional optimization.

TensorRT for Ubuntu 14.04 is built using gcc 4.8.4 gcc 4.8.

TensorRT for Ubuntu 16.04 is built using gcc 5.4.0 gcc 5.

TensorRT for Android is built using NDK r13b.

TensorRT for QNX is built using gcc 5.4.0 gcc 5.