TensorRT Release Notes :: NVIDIA Deep Learning SDK Documentation

TensorRT Release 4.x.x

TensorRT Release 4.0.1

This TensorRT 4.0.1 General Availability release includes several enhancements and improvements compared to the previously released TensorRT 3.0.4.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

TensorRT 4.0.1 GA has been tested with cuDNN 7.1.3 and now requires cuDNN 7.1.x.
Support for ONNX 1.0 (Open Neural Network Exchange) has been implemented. ONNX is a standard for representing deep learning models that enable models to be transferred between frameworks. TensorRT can now parse the network definitions in ONNX format, in addition to NVCaffe and UFF formats.
The Custom Layer API now supports user-defined layers that take half precision, or FP16, inputs and return FP16 outputs.
Added support for the MatrixMultiply, Constant, Gather, Ragged SoftMax, Reduce, RNNv2 and TopK layers (for K up to 25).
This release has optimizations which target recommender systems like Neural Collaborative Filtering.
Many layers now support the ability to broadcast across the batch dimension.
In TensorRT 3.0, INT8 had issues with rounding and striding in the Activation layer. This may have caused INT8 accuracy to be low. Those issues have been fixed.
The C++ samples and Python examples were tested with TensorFlow 1.8 and PyTorch 0.4.0 where applicable.
Added sampleOnnxMNIST. This sample shows the conversion of an MNIST network in ONNX format to a TensorRT network.
Added sampleNMT. Neural Machine Translation (NMT) using sequence to sequence (seq2seq) models has garnered a lot of attention and is used in various NMT frameworks. sampleNMT is a highly modular sample for inferencing using C++ and TensorRT API so that you can consider using it as a reference point in your projects.
Updated sampleCharRNN to use RNNv2 and converting weights from TensorFlow to TensorRT.
Added sampleUffSSD. This sample converts the TensorFlow Single Shot MultiBox Detector (SSD) network to a UFF format and runs it on TensorRT using plugins. This sample also demonstrates how other TensorFlow networks can be preprocessed and converted to UFF format with support of custom plugin nodes.
Memory management improvements (see the Memory Management section in the Developer Guide for details.)
- Applications may now provide their own memory for activations and workspace during inference, which is used only while the pipeline is running.
- An allocator callback is available for all memory allocated on the GPU. In addition, model deserialization is significantly faster (from system memory, up to 10x faster on large models).

Using TensorRT 4.0.1

Ensure you are familiar with the following notes when using this release.

The builder methods setHalf2Mode and getHalf2Mode have been superseded by setFp16Mode and getFp16Mode which better represent their intended usage.
The sample utility giexec has been renamed to trtexec to be consistent with the product name, TensorRT, which is often shortened to TRT. A compatibility script for users of giexec has been included to help users make the transition.

Deprecated Features

The RNN layer type is deprecated in favor of RNNv2, however, it is still available for backwards compatibility.
Legacy GIE version defines in NvInfer.h have been removed. They were NV_GIE_MAJOR, NV_GIE_MINOR, NV_GIE_PATCH, and NV_GIE_VERSION. The correct alternatives are NV_TENSORRT_MAJOR, NV_TENSORRT_MINOR, NV_TENSORRT_PATCH, and NV_TENSORRT_VERSION which existed in TensorRT 3.0.4 as well.
Dimension types are now ignored in the API, however, they are still available for backwards compatibility.

Known Issues

If the ONNX parser included with TensorRT is unable to parse your model, then try updating to the latest open source ONNX parser, which may resolve your issue.
PyTorch no longer supports Python 3.4 with their current release (0.4.0). Therefore, the TensorRT PyTorch examples will not work when using Python 3 on Ubuntu 14.04.
Reshape to a tensor that has a larger number of dimensions than the input tensor is not supported.
Reformat has a known memory overwrite issue on Volta when FP16 is used with the Concatenation layer and the Reformat layer.
If you have two different CUDA versions of TensorRT installed, such as CUDA 8.0 and CUDA 9.0, or CUDA 9.2 using local repos, then you will need to execute an additional command to install the CUDA 8.0 version of TensorRT and prevent it from upgrading to the CUDA 9.0 or CUDA 9.2 versions of TensorRT.
```
sudo apt-get install libnvinfer4=4.1.2-1+cuda8.0 \
  libnvinfer-dev=4.1.2-1+cuda8.0
sudo apt-mark hold libnvinfer4 libnvinfer-dev
```
sampleNMT
- Performance is not fully optimized
sampleUffSSD
- Some precision loss was observed while running the network in INT8 mode, causing some objects to go undetected in the image. Our general observation is that having at least 500 images for calibration is a good starting point.
Performance regressions
- Compared to earlier TensorRT versions, a 5% slowdown was observed on AlexNet when running on GP102 devices with batch size 2 using the NvCaffeParser.
- Compared to earlier TensorRT versions, a 5% to 10% slowdown was observed on variants of inception and some instances of ResNet when using the NvUffParser.
The NvUffParser returns the output tensor in the shape specified by the user, and not in NCHW shape as in earlier versions of TensorRT. In other words, the output tensor shape will match the shape of the tensor returned by TensorFlow, for the same network.
The Python 3.4 documentation is missing from the Ubuntu 14.04 packages. Refer to the Python 2.7 documentation or view the online Python documentation as an alternative.
Some samples do not provide a -h argument to print the sample usage. You can refer to the README.txt file in the sample directory for usage examples. Also, if the data files for some samples cannot be found it will sometimes raise an exception and abort instead of exiting normally.
If you have more than one version of the CUDA toolkit installed on your system and the CUDA version for TensorRT is not the latest version of the CUDA toolkit, then you will need to provide an additional argument when compiling the samples. For example, you have CUDA 9.0 and CUDA 9.2 installed and you are using TensorRT for CUDA 9.0.
```
make CUDA_INSTALL_DIR=/usr/local/cuda-9.0
```

When you pip uninstall the tensorrtplugins Python package, you may see the following error which can be ignored.

OSError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/tensorrtplugins-4.0.1.0-py2.7-linux-x86_64.egg'

Due to a bug in cuDNN 7.1.3, which is the version of cuDNN TensorRT has been validated against, using RNNs with half precision on Kepler GPUs will cause TensorRT to abort. FP16 support is non-native on Kepler GPUs, therefore, using any precision other than FP32 is discouraged except for testing.
sampleMovieLens is currently limited to running a maximum of 8 concurrent processes on a Titan V and may result in suboptimal engines during parallel execution. The sample will be enhanced in the near future to support a greater degree of concurrency. Additionally, to ensure compatibility with TensorRT, use TensorFlow <= 1.7.0 to train the model. There may be a conflict between the versions of CUDA and/or cuDNN used by TensorRT and TensorFlow 1.7. We suggest that you install TensorFlow 1.7 CPU in order to complete the sample.
```
python -m pip install tensorflow==1.7.0
```

TensorRT Release 4.0 Release Candidate (RC) 2

This TensorRT 4.0 Release Candidate (RC) 2 includes several enhancements and improvements compared to the previously released TensorRT 3.0.4. TensorRT 4.0 RC2 supports desktop and Tegra platforms. This release candidate is for early testing and feedback, for production use of TensorRT, continue to use 3.0.4.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

TensorRT 4.0 RC2 for mobile supports cuDNN 7.1.2.
TensorRT 4.0 RC2 for desktop supports cuDNN 7.1.3.
Support for ONNX 1.0 (Open Neural Network Exchange) has been implemented. TensorRT can now parse the network definitions in ONNX format, in addition to NVCaffe and UFF formats.
The Custom Layer API now supports user-defined layers that take half precision, or FP16, inputs and return FP16 tensors.
Added support for the MatrixMultiply, Constant, Gather, Ragged SoftMax, Reduce, RNNv2 and TopK layers (for K up to 25).
Added SampleONNXMNIST sample. Open Neural Network Exchange (ONNX) is a standard for representing deep learning models that enable models to be transferred between frameworks. This sample shows the conversion of an MNIST network in ONNX format to a TensorRT network.

Deprecated Features

The RNN layer type is deprecated in favor of RNNv2, however, it is still available for backwards compatibility.
Legacy GIE version defines in NvInfer.h have been removed. They were NV_GIE_MAJOR, NV_GIE_MINOR, NV_GIE_PATCH, and NV_GIE_VERSION. The correct alternatives are NV_TENSORRT_MAJOR, NV_TENSORRT_MINOR, NV_TENSORRT_PATCH, and NV_TENSORRT_VERSION which existed in TensorRT 3.0.4 as well.
Dimension Types are now ignored in the API, however, they are still available for backwards compatibility.

Known Issues

SampleMLP and SampleNMT are included in this release, however, they are beta samples. They are currently not optimized for mobile platforms.

TensorRT Release 4.0 Release Candidate (RC)

This TensorRT 4.0 Release Candidate (RC) includes several enhancements and improvements compared to the previously released TensorRT 3.0.4. TensorRT 4.0 RC supports x86 desktop platforms only. This release candidate is for early testing and feedback, for production use of TensorRT, continue to use 3.0.4.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Support for ONNX 1.0 (Open Neural Network Exchange) has been implemented. TensorRT can now parse the network definitions in ONNX format, in addition to NVCaffe and UFF formats.
The Custom Layer API now supports user-defined layers that take half precision, or FP16, inputs and return FP16 tensors.
Added support for the MatrixMultiply, Constant, Gather, Ragged SoftMax, Reduce, RNNv2 and TopK layers (for K up to 25).
The samples were tested with TensorFlow 1.6. You must be using cuDNN 7.0.x in order to use both TensorRT and TensorFlow at the same time since TensorFlow 1.6 does not support cuDNN 7.1.x yet.
Added SampleMLP sample for multi-layer perceptrons.
Added SampleONNXMNIST sample. Open Neural Network Exchange (ONNX) is a standard for representing deep learning models that enable models to be transferred between frameworks. This sample shows the conversion of an MNIST network in ONNX format to a TensorRT network.
Added SampleNMT sample. Neural Machine Translation (NMT) using sequence to sequence (seq2seq) models has garnered a lot of attention and is used in various NMT frameworks. SampleNMT is a highly modular sample for inferencing using C++ and TensorRT API so that you can consider using it as a reference point in your projects.
Updated SampleCharRNN sample to use RNNv2 and converting weights from TensorFlow to TensorRT.

Deprecated Features

The RNN layer type is deprecated in favor of RNNv2, however, it is still available for backwards compatibility.
Legacy GIE version defines in NvInfer.h have been removed. They were NV_GIE_MAJOR, NV_GIE_MINOR, NV_GIE_PATCH, and NV_GIE_VERSION. The correct alternatives are NV_TENSORRT_MAJOR, NV_TENSORRT_MINOR, NV_TENSORRT_PATCH, and NV_TENSORRT_VERSION which existed in TensorRT 3.0.4 as well.
Dimension Types are now ignored in the API, however, they are still available for backwards compatibility.

Known Issues

If you were previously using the machine learning debian repository, then it will conflict with the version of libcudnn7 that is contained within the local repository for TensorRT. The following commands will downgrade libcudnn7 to version 7.0.5.15, which is supported and tested with TensorRT, and hold the package at this version. If you are using CUDA 8.0 for your application, ensure you replace cuda9.0 with cuda8.0.
```
sudo apt-get install libcudnn7=7.0.5.15-1+cuda9.0 libcudnn7-dev=7.0.5.15-1+cuda9.0
sudo apt-mark hold libcudnn7 libcudnn7-dev
```
If you would like to later upgrade libcudnn7 to the latest version, then you can use the following commands to remove the hold.
```
sudo apt-mark unhold libcudnn7 libcudnn7-dev
sudo apt-get dist-upgrade
```
If you have both the CUDA 8.0 and CUDA 9.0 local repos installed for TensorRT, then you will need to execute an additional command to install the CUDA 8.0 version of TensorRT and prevent it from upgrading to the CUDA 9.0 version of TensorRT.
```
sudo apt-get install libnvinfer4=4.1.0-1+cuda8.0 libnvinfer-dev=4.1.0-1+cuda8.0
sudo apt-mark hold libnvinfer4 libnvinfer-dev
```
If you installed the dependencies for the TensorRT python examples using pip install tensorrt[examples] then it could replace the GPU accelerated version of TensorFlow with the CPU accelerated version of TensorFlow. You will need to remove the version of TensorFlow installed as a TensorRT dependency and install the GPU accelerated version in its place.
```
pip uninstall tensorflow
pip install tensorflow-gpu
```
SampleNMT
- Performance is not fully optimized
- SampleNMT does not support FP16
- The vocabulary files are expected to be in the ../../../../data/samples/nmt/deen directory from the executable. The sample doesn’t print usage if vocabulary files are not present in the above mentioned path. For more information, see the README.txt file for usage details.
SampleMLP
- Performance is not fully optimized
- SampleMLP does not support FP16
- The accuracy of MLPs for handwritten digit recognition is lower than CNNs, therefore, the sample may give an incorrect prediction in some cases.
- SampleMLP usage has incorrect details on the -a parameter. It should be -a <#>. The activation to use on the layers, defaults to 1. Valid values are 1[ReLU], 2[Sigmoid], and 3[TanH]; instead of -a <#>. The activation to use in on the layers, defaults to 1. Valid values are 0[ReLU], 1[Sigmoid], and 2[TanH].
- The timing information printed by the sample may not be accurate.
Performance regressions
- A 5% slowdown was observed on AlexNet when running on GP102 devices with batch size 2 using the Caffe parser.
- A 5% to 10% slowdown was observed on variants of inception, some instances of ResNet, and some instances of SSD when using the UFF parser.