TensorRT Release Notes :: NVIDIA Deep Learning SDK Documentation

TensorRT Release 3.x.x

TensorRT Release 3.0.4

This TensorRT 3.0.4 General Availability release is a minor release and includes some improvements and fixes compared to the previously released TensorRT 3.0.2.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Fixed an issue with INT8 deconvolution bias. If you have seen an issue with deconvolution INT8 accuracy especially regarding TensorRT. 2.1, then this fix should solve the issue.
Fixed an accuracy issue in FP16 mode for NVCaffe models.

Using TensorRT 3.0.4

Ensure you are familiar with the following notes when using this release.

The UFF converter script is packaged only for x86 users. If you are not an x86 user, and you want to convert TensorFlow models into UFF, you need to obtain the conversion script from the x86 package of TensorRT.

TensorRT Release 3.0.3

This TensorRT 3.0.3 General Availability release is a minor release and includes some improvements and fixes compared to the previously released TensorRT 3.0.2. This release is for AArch64 only.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Added support for Xavier

Using TensorRT 3.0.3

Ensure you are familiar with the following notes when using this release.

When building the samples in this release, it is necessary to specify CUDA_INSTALL_DIR as an argument to the Makefile.
This release does not support TensorRT Python bindings.

Known Issues

When building the samples on aarch64 natively, there is an issue in the Makefile.config file that requires you to provide an additional option to make, namely CUDA_LIBDIR.
The infer_caffe_static test fails on D5L Parker dGPU. This is a regression from the previous release.
QnX has known performance issues with the mmap and malloc() operating system memory allocation routines. These issues can affect the performance of TensorRT; up to 10X.

TensorRT Release 3.0.2

This TensorRT 3.0.2 General Availability release is a minor release and includes some improvements and fixes compared to the previously released TensorRT 3.0.1.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Fixed a bug in one of the INT8 deconvolution kernels that was generating incorrect results. This fixed accuracy regression from 2.1 for networks that use deconvolutions.
Fixed a bug where the builder would report out-of-memory when compiling a low precision network, in the case that a low-precision version of the kernel could not be found. The builder now correctly falls back to a higher precision version of the kernel.
Fixed a bug where the existence of some low-precision kernels were being incorrectly reported to the builder.

Using TensorRT 3.0.2

Ensure you are familiar with the following notes when using this release.

When working with large networks and large batch sizes on the Jetson TX1 you may see failures that are the result of CUDA error 4. This error generally means a CUDA kernel failed to execute properly, but sometimes this can mean the CUDA kernel actually timed out. The CPU and GPU share memory on the Jetson TX1 and reducing the memory used by the CPU would help the situation. If you are not using the graphical display on L4T you can stop the X11 server to free up CPU and GPU memory. This can be done using:
```
$ sudo systemctl stop lightdm.service
```

Known Issues

INT8 deconvolutions with biases have the bias scaled incorrectly. U-Net based segmentation networks typically have non-zero bias.
For TensorRT Android 32-bit, if your memory usage is high, then you may see TensorRT failures. The issue is related to the CUDA allocated buffer address being higher or equal to 0x80000000 and it is hard to know the exact memory usage after which this issue is hit.
If you are installing TensorRT from a tar package (instead of using the .deb packages and apt-get), you will need to update the custom_plugins example to point to the location that the tar package was installed into. For example, in the <PYTHON_INSTALL_PATH>/tensorrt/examples/custom_layers/tensorrtplugins/setup.py file change the following:
- Change TENSORRT_INC_DIR to point to the <TAR_INSTALL_ROOT>/include directory.
- Change TENSORRT_LIB_DIR to point to <TAR_INSTALL_ROOT>/lib directory.
If you were previously using the machine learning debian repository, then it will conflict with the version of libcudnn7 that is contained within the local repository for TensorRT. The following commands will downgrad libcudnn7 to the CUDA 9.0 version, which is supported by TensorRT, and hold the package at this version.
```
sudo apt-get install libcudnn7=7.0.5.15-1+cuda9.0 
libcudnn7-dev=7.0.5.15-1+cuda9.0
sudo apt-mark hold libcudnn7 libcudnn7-dev
```
If you would like to later upgrade libcudnn7 to the latest version, then you can use the following commands to remove the hold.
```
sudo apt-mark unhold libcudnn7 libcudnn7-dev
sudo apt-get dist-upgrade
```

TensorRT Release 3.0.1

This TensorRT 3.0.1 General Availability release includes several enhancements and improvements compared to the previously released TensorRT 2.1.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

NvCaffeParser

NVCaffe 0.16 is now supported.

New deep learning layers or algorithms

The TensorRT deconvolution layer previously did not support non-zero padding, or stride values that were distinct from kernel size. These restrictions have now been lifted.
The TensorRT deconvolution layer now supports groups.
Non-determinism in the deconvolution layer implementation has been eliminated.
The TensorRT convolution layer API now supports dilated convolutions.
The TensorRT API now supports these new layers (but they are not supported via the NvCaffeParser):
- unary
- shuffle
- padding
The Elementwise (eltwise) layer now supports broadcasting of input dimensions.
The Flatten layer flattens the input while maintaining the batch_size. This layer was added in the UFF converter and NvUffParser.
The Squeeze layer removes dimensions of size 1 from the shape of a tensor. This layer was added in the UFF converter and NvUffParser.

Universal Framework Format 0.2

UFF format is designed to encapsulate trained neural networks so that they can be parsed by TensorRT. It’s also designed in a way of storing the information about a neural network that is needed to create an inference engine based on that neural network.

Performance

Performance regressions seen from v2.1 to 3.0.1 Release Candidate for INT8 and FP16 are now fixed.
- The INT8 regression in LRN that impacted networks like GoogleNet and AlexNet is now fixed.
- The FP16 regression that impacted networks like AlexNet and ResNet-50 is now fixed.
The performance of the Xception network has improved, for example, by more than 3 times when batch size is 8 on Tesla P4.
Changed how the CPU synchronizes with the GPU in order to reduce the overall load on the CPU when running inference with TensorRT.
The deconvolution layer implementation included with TensorRT was, in some circumstances, using significantly more memory and had lower performance than the implementation provided by the cuDNN library. This has now been fixed.
MAX_TENSOR_SIZE changed from (1<<30) to ((1<<31)-1). This change enables the user to run larger batch sizes for networks with large input images.

Samples

All Python examples now import TensorRT after the appropriate framework is imported. For example, the tf_to_trt.py example imports TensorFlow before importing TensorRT. This is done to avoid cuDNN version conflict issues.
The tf_to_trt and pytorch_to_trt samples shipped with the TensorRT 3.0 Release Candidate included network models that were improperly trained with the MNIST dataset, resulting in poor classification accuracy. This version has new models that have been properly trained with the MNIST dataset to provide better classification accuracy.
The pytorch_to_trt sample originally showed low accuracy with MNIST, however, data and training parameters were modified to address this.
The giexec command line wrapper in earlier versions would fail if users specify workspace >= 2048 MB. This issue is now fixed.

Functionality

The AverageCountExcludesPadding attribute has been added to the pooling layer to control whether to use inclusive or exclusive averaging. The default is true, as used by most frameworks. The NvCaffeParser sets this to false, restoring compatibility of padded average pooling between NVCaffe and TensorRT.

TensorRT Python API

TensorRT 3.0.1 introduces the TensorRT Python API, which provides developers interfaces to:

the NvCaffeParser
the NvUffParser
The nvinfer graph definition API
the inference engine builder
the engine executor
the perform calibration for running inference with INT8
a workflow to include C++ custom layer implementations

TensorRT Lite: A simplified API for inference

TensorRT 3.0.1 provides a streamlined set of API functions (tensorrt.lite) that allow users to export a trained model, build an engine, and run inference, with only a few lines of Python code.

Streamlined export of models trained in TensorFlow into TensorRT

With this release, you can take a trained model in TensorFlow saved in a TensorFlow protobuf and convert it to run in TensorRT. The TensorFlow model exporter creates an output file in a format called UFF (Universal Framework Format), which can then be parsed by TensorRT.

Currently the export path is expected to support the following:

TensorFlow 1.3
FP32 CNNs
FP16 CNNs

The TensorFlow export path is currently not expected to support the following:

Other versions of TensorFlow (0.9, 1.1, etc.)
RNNs
INT8 CNNs

Volta

The NVIDIA Volta architecture is now supported, including the Tesla V100 GPU. On Volta devices, the Tensor Core feature provides a large performance improvement, and Tensor Cores are automatically used when the builder is set to half2mode.

QNX

TensorRT 3.0.1 runs on the QNX operating system on the Drive PX2 platform.

Release Notes 3.0.1 Errata

Due to the cuDNN symbol conflict issues between TensorRT and TensorFlow, the tf_to_trt Python example works with TensorFlow 1.4.0 only and not prior versions of TensorFlow.
If your system has multiple libcudnnX-dev versions installed, ensure that cuDNN 7 is used for compiling and running TensorRT samples. This problem can occur when you have TensorRT and a framework installed. TensorRT uses cuDNN 7 while most frameworks are currently on cuDNN 6.
There are various details in the Release Notes and Developer Guide about the pytorch_to_trt Python example. This sample is no longer part of the package because of cuDNN symbol conflict issues between PyTorch and TensorRT.
In the Installation and Setup section of the Release Notes, it is mentioned that TENSORRT_LIB_DIR should point to <TAR_INSTALL_ROOT>/lib64. Instead, TENSORRT_LIB_DIR should point to <TAR_INSTALL_ROOT>/lib.
There are some known minor performance regressions for FP32 mode on K80 for large batch sizes on CUDA 8. Update to CUDA 9 if you see similar performance regression.

Using TensorRT 3.0.1

Ensure you are familiar with the following notes when using this release.

Although networks can use NHWC and NCHW, TensorFlow users are encouraged to convert their networks to use NCHW data ordering explicitly in order to achieve the best possible performance.
The libnvcaffe_parsers.so library file is now called libnvparsers.so. The links for libnvcaffe_parsers are updated to point to the new libnvparsers library. The static library libnvcaffe_parser.a is also linked to the new libnvparsers.

Known Issues

Installation and Setup

If you are installing TensorRT from a tar package (instead of using the .deb packages and apt-get), you will need to update the custom_plugins example to point to the location that the tar package was installed into. For example, in the <PYTHON_INSTALL_PATH>/tensorrt/examples/custom_layers/tensorrtplugins/setup.py file change the following:
- Change TENSORRT_INC_DIR to point to the <TAR_INSTALL_ROOT>/include directory.
- Change TENSORRT_LIB_DIR to point to <TAR_INSTALL_ROOT>/lib64 directory.
The PyTorch based sample will not work with the CUDA 9 Toolkit. It will only work with the CUDA 8 Toolkit.
When using the TensorRT APIs from Python, import the tensorflow and uff modules before importing the tensorrt module. This is required to avoid a potential namespace conflict with the protobuf library as well as the cuDNN version. In a future update, the modules will be fixed to allow the loading of these Python modules to be in an arbitrary order.
The TensorRT Python APIs are only supported on x86 based systems. Some installation packages for ARM based systems may contain Python .whl files. Do not install these on the ARM systems, as they will not function.
The TensorRT product version is incremented from 2.1 to 3.0.1 because we added major new functionality to the product. The libnvinfer package version number was incremented from 3.0.2 to 4.0 because we made non-backward compatible changes to the application programming interface.
The TensorRT debian package name was simplified in this release to tensorrt. In previous releases, the product version was used as a suffix, for example tensorrt-2.1.2.
If you have trouble installing the TensorRT Python modules on Ubuntu 14.04, refer to the steps on installing swig to resolve the issue. For installation instructions, see Unix Installation.
The Flatten layer can only be placed in front of the Fully Connected layer. This means that the Flatten layer can only be used if its output is directly fed to a Fully Connected layer.
The Squeeze layer only implements the binary squeeze (removing specific size 1 dimensions). The batch dimension cannot be removed.
If you see the Numpy.core.multiarray failed to import error message, upgrade your NumPy to version 1.13.0 or greater.
For Ubuntu 14.04, use pip version >= 9.0.1 to get all the dependencies installed.

TensorFlow Model Conversion

The TensorFlow to TensorRT model export works only when running TensorFlow with GPU support enabled. The converter does not work if TensorFlow is running without GPU acceleration.
The TensorFlow to TensorRT model export does not work with network models specified using the TensorFlow Slim interface, nor does it work with models specified using the Keras interface.
The TensorFlow to TensorRT model export does not support recurrent neural network (RNN) models.
The TensorFlow to TensorRT model export may produce a model that has extra tensor reformatting layers compared to a model generated directly using the C++ or Python TensorRT graph builder API. This may cause the model that originated from TensorFlow to run slower than the model constructed directly with the TensorRT APIs.
Although TensorFlow models can use either NHWC or NCHW tensor layouts, TensorFlow users are encouraged to convert their models to use the NCHW tensor layout explicitly, in order to achieve the best possible performance when exporting the model to TensorRT.
The TensorFlow parser requires that input will be fed to the network in NCHW format.

Other known issues

On the V100 GPU, running models with INT8 only works if the batch size is evenly divisible by 4.
TensorRT Python interface requires NumPy 1.13.0 while the installing TensorRT using pip may only install 1.11.0. Use sudo pip install numpy -U to update if the NumPy version on the user machine is not 1.13.0.

TensorRT Release 3.0 Release Candidate (RC)

This is the second preview release of TensorRT. For production use of TensorRT, continue to use 2.1.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Volta

Streamlined export of models trained in TensorFlow into TensorRT

With this release you can take a trained model in TensorFlow saved in a TensorFlow protobuf and convert it to run in TensorRT. The TensorFlow model exporter creates an output file in a format called UFF (Universal Framework Format), which can then be parsed by TensorRT.

Currently the export path is expected to support the following:

Tensorflow 1.3
FP32 CNNs
FP16 CNNs

The TensorFlow export path is currently not expected to support the following:

Other versions of TensorFlow (0.9, 1.1, etc.)
RNNs
INT8 CNNs

TensorFlow convenience functions

NVIDIA provides convenience functions so that when using UFF and TensorRT to export a model and run inference, only a few lines of code is needed.

Universal Framework Format 0.1

UFF format is designed to encapsulate trained neural networks so they can be parsed by TensorRT.

Python API

TensorRT 3.0 introduces the TensorRT Python API, which provides developers interfaces to:

the NvCaffeParser
the NvUffParser
The nvinfer graph definition API
the inference engine builder
the engine executor

TensorRT also introduces a workflow to include C++ custom layer implementations in Python based TensorRT applications.

New deep learning layers or algorithms

The TensorRT deconvolution layer previously did not support non-zero padding, or stride values that were distinct from kernel size. These restrictions have now been lifted.
The TensorRT deconvolution layer now supports groups.
Non-determinism in the deconvolution layer implementation has been eliminated.
The TensorRT convolution layer API now supports dilated convolutions.
The TensorRT API now supports these new layers (but they are not supported via the NvCaffeParser):
- unary
- shuffle
- padding
The Elementwise (eltwise) layer now supports broadcasting of input dimensions.

QNX

TensorRT 3.0 runs on the QNX operating system on the Drive PX2 platform.

Known Issues

Installation and Setup

If you are installing TensorRT from a tar package (instead of using the .deb packages and apt-get), then the custom_plugins example will need to be updated to point to the location that the tar package was installed to. For example, in the <PYTHON_INSTALL_PATH>/tensorrt/examples/custom_layers/tensorrtplugins/setup.py file change the following:
- Change TENSORRT_INC_DIR to point to the <TAR_INSTALL_ROOT>/include directory.
- Change TENSORRT_LIB_DIR to point to the <TAR_INSTALL_ROOT>/lib directory.
The PyTorch based sample will not work with the CUDA 9 Toolkit. It will only work with the CUDA 8 Toolkit.
When using the TensorRT APIs from Python, import the tensorflow and uff modules before importing the tensorrt module. This is required to avoid a potential namespace conflict with the protobuf library. In a future update, the modules will be fixed to allow the loading of these Python modules to be in an arbitrary order.
The TensorRT Python APIs are only supported on x86 based systems. Some installation packages for ARM based systems may contain Python .whl files. Do not install these on the ARM systems, as they will not function.
The TensorRT product version is incremented from 2.1 to 3.0 because we added major new functionality to the product. The libnvinfer package version number was incremented from 3.0.2 to 4.0 because we made non-backward compatible changes to the application programming interface.
The TensorRT debian package name was simplified in this release to tensorrt. In previous releases, the product version was used as a suffix, for example tensorrt-2.1.2.
If you have trouble installing the TensorRT Python modules on Ubuntu 14.04, refer to the steps on installing swig to resolve the issue. For installation instructions, see Unix Installation.
There is a performance regression in the LRN layer when the network is running in INT8 mode. It impacts networks like GoogleNet and AlexNet but not ResNet-50, VGG-19 etc.

TensorFlow Model Conversion

The TensorFlow to TensorRT model export works only when running TensorFlow with GPU support enabled. The converter does not work if TensorFlow is running without GPU acceleration.
The TensorFlow to TensorRT model export does not work with network models specified using the TensorFlow Slim interface, nor does it work with models specified using the Keras interface.
The TensorFlow to TensorRT model export does not support recurrent neural network (RNN) models.
The TensorFlow to TensorRT model export does not support convolutional layers that have asymmetric padding (a different number of zero-padded rows and columns).
The TensorFlow to TensorRT model export may produce a model that has extra tensor reformatting layers compared to a model generated directly using the C++ or Python TensorRT graph builder API. This may cause the model that originated from TensorFlow to run slower than the model constructed directly with the TensorRT APIs.
Although TensorFlow models can use either NHWC or NCHW tensor layouts, TensorFlow users are encouraged to convert their models to use the NCHW tensor layout explicitly, in order to achieve the best possible performance.

Other known issues

The Inception v4 network models are not supported with this Release Candidate with FP16 on V100.
On V100, running models with INT8 do not work if the batch size is not divisible by 4.
The Average Pooling behavior has changed to exclude padding from the computation, which is how all other Pooling modes handle padding. This results in incorrect behavior for network models which rely on Average Pooling and which include padding, such as Inception v3. This issue will be addressed in a future release.
In this Release Candidate, the arguments for the tensorrt_exec.py script are slightly different than the ones for the giexec executable, and can be a source of confusion for users. Consult the documentation carefully to avoid unexpected errors. The command-line arguments will be changed to match giexec in a future update.
The INT8 Calibration feature is not available in the TensorRT Python APIs.
The examples/custom_layer sample will not work on Ubuntu 14.04 x86_64 systems, however, it does work properly on Ubuntu 16.04 systems. This will be fixed in the next update of the software.

TensorRT Release 3.0 Early Access (EA)

This is a preview release of TensorRT. For production use of TensorRT, continue to use 2.1.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Streamlined export for models trained in TensorFlow to TensorRT

With this release you can take a TensorFlow trained model saved in a TensorFlow protobuf and convert it to run in TensorRT. The TensorFlow to UFF converter creates an output file in a format called UFF (Universal Framework Format) which can then be read into TensorRT.

Currently the export path is expected to support the following:

Tensorflow 1.0
FP32 CNNs
FP16 CNNs

The TensorFlow export path is currently not expected to support the following:

Other versions of TensorFlow (0.9, 1.1, etc..)
RNNs
INT8 CNNs

TensorFlow convenience functions

NVIDIA provides convenience functions so that when using UFF and TensorRT to export a model and run inference, only a few lines of code is needed.

Universal Framework Format 0.1

UFF format is designed as a way of storing the information about a neural network that is needed to create an inference engine based on that neural network.

Python API

TensorRT 3.0 introduces the TensorRT Python API, allowing developers to access:

the NvCaffeParser
the NvUffParser
The nvinfer graph definition API
the inference engine builder
the inference-time interface for engine execution within Python

TensorRT also introduces a workflow to include C++ custom layer implementations in Python based TensorRT applications.

Using TensorRT 3.0

Ensure you are familiar with the following notes when using this release.

Although networks can use NHWC and NCHW, TensorFlow users are encouraged to convert their networks to use NCHW data ordering explicitly in order to achieve the best possible performance.
Average pooling behavior changed to exclude the padding from the computation. The padding is now excluded from the computation in all of the pooling modes. This results in incorrect behavior for networks which rely on average pooling which includes padding, such as inceptionV3. This issue will be addressed in a future release.
The libnvcaffe_parsers.so library file is now called libnvparsers.so. The links for libnvcaffe_parsers are updated to point to the new libnvparsers library. The static library libnvcaffe_parser.a is also linked to the new libnvparsers. For example:
- Old structure: libnvcaffe_parsers.4.0.0.so links to libnvcaffe_parsers.4.so which links to libnvcaffe_parsers.so.
- New structure: libnvcaffe_parsers.4.0.0.so links to ibnvcaffe_parsers.4.so which links to libnvcaffe_parsers.so which links to libnvparsers.so(actual file).

Known Issues

TensorRT does not support asymmetric padding.
Some TensorRT optimizations disabled just for this Early Release (EA) to ensure that the UFF model runs properly. This will be addressed in TensorRT 3.0.
The TensorFlow conversion path is not fully optimized.
INT8 Calibration is not available in Python.
Deconvolution is not implemented in the UFF workflow.