TensorRT Release 2.1

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Custom Layer API
If you want TensorRT to use novel, unique or proprietary layers in the evaluation of certain networks, the Custom Layer API lets you provide a CUDA kernel function that implements the functionality you want.

Installers
You have two ways you can install TensorRT 2.1:
  1. Ubuntu deb packages. If you have root access and prefer to use package management to ensure consistency of dependencies, then you can use the apt-get command and the deb packages.
  2. Tar file based installers. If you do not have root access or you want to install multiple versions of TensorRT side-by-side for comparison purposes, then you can use the tar file install. The tar file installation uses target dep-style directory structures so that you can install TensorRT libraries for multiple architectures and then do cross compilation.

INT8 support
TensorRT can be used on supported GPUs (such as P4 and P40) to execute networks using INT8 rather than FP32 precision. Networks using INT8 deliver significant performance improvements.

Recurrent Neural Network
LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) are two popular and powerful variations of a Recurrent Neural Network cell. Recurrent neural networks are designed to work with sequences of characters, words, sounds, images, etc. TensorRT 2.1 provides implementations of LSTM, GRU and the original RNN layer.

Using TensorRT 2.1

Ensure you are familiar with the following notes when using this release.
  • Running networks in FP16 or INT8 may not work correctly on platforms without hardware support for the appropriate reduced precision instructions.
  • GTX 750 and K1200 users will need to upgrade to CUDA 8 in order to use TensorRT.
  • If you have previously installed TensorRT 2.0 EA or TensorRT 2.1 RC and you install TensorRT 2.1, you may find that the old meta package is still installed. It can be safely removed with the apt-get command.
  • Debian packages are supplied in the form of local repositories. Once you have installed TensorRT, you can safely remove the TensorRT local repository debian package.
  • The implementation of deconvolution is now deterministic. In order to ensure determinism, the new algorithm requires more workspace.
  • FP16 performance was significantly improved for batch size = 1. The new algorithm is sometimes slower for batch sizes greater than one.
  • Calibration for INT8 does not require labeled data. SampleINT8 uses labels only to compare the accuracy of INT8 inference with the accuracy of FP32 inference.
  • Running with larger batch sizes gives higher overall throughput but uses more memory. When trying TensorRT out on GPUs with smaller memory, be aware that some of the samples may not work with batch sizes of 128.
  • The included Caffe parser library does not currently understand the NVIDIA/Caffe format for batch normalization. The BVLC/Caffe batch normalization format is parsed correctly.

Deprecated Features

The parameterized calibration technique introduced in the 2.0 EA pre-release has been replaced by the new entropy calibration mechanism.
  • The Legacy class IInt8LegacyCalibrator is deprecated.

Known Issues

  • When using reduced precision, either INT8 or FP16, on platforms with hardware support for those types, pooling with window sizes other than 1,2,3,5 or 7 will fail.
  • When using MAX_AVERAGE_BLEND or AVERAGE pooling in INT8 with a channel count that is not a multiple of 4, TensorRT may generate incorrect results.
  • When downloading the Faster R-CNN data on Jetson TX1 users may see the following error:
    ERROR: cannot verify dl.dropboxusercontent.com's certificate, issued by 'CN=DigiCert SHA2 High Assurance Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US':
      Unable to locally verify the issuer's authority.
    To connect to dl.dropboxusercontent.com insecurely, use `--no-check-certificate`.

    Adding the --no-check-certificate flag should resolve the issue.