TensorRT Release 5.0 Release Candidate (RC)

This is the release candidate (RC) for TensorRT 5.0. It includes several enhancements and improvements compared to the previously released TensorRT 4.0.1. This preview release is for early testing and feedback, therefore, for production use of TensorRT, continue to use TensorRT 4.0.1.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Platforms
Added support for CentOS 7.5 and Ubuntu 18.04.

Turing
You must use CUDA 10.0 or later if you are using a Turing GPU.

DLA (Deep Learning Accelerator)
The layers supported by DLA are Activation, Concatenation, Convolution, Deconvolution, ElementWise, FullyConnected, LRN, Pooling, and Scale. For layer specific constraints, see DLA Supported Layers. Networks such as AlexNet, GoogleNet, ResNet-50, and MNIST work with DLA. Other CNN networks may work, but they have not been extensively tested and may result in failures including segfaults.
The trtexec tool can be used to run on DLA with the --useDLA=N and --fp16 options. To run the AlexNet network on DLA using trtexec, issue:
 ./trtexec --deploy=data/AlexNet/AlexNet_N2.prototxt --output=prob --useDLA=1 --fp16 --allowGPUFallback

trtexec does not support ONNX models to run on DLA.

Redesigned Python API
The Python API has been rewritten from scratch and includes various improvements. In addition to several bug fixes, it is now possible to serialize and deserialize an engine to and from a file using the Python API. Python samples using the new API include parser samples for ResNet-50, a Network API sample for MNIST, a plugin sample using Caffe, and an end-to-end sample using TensorFlow.

INT8
Support for user-defined INT8 scales, using the new ITensor::setDynamicRange function. This makes it possible to provide custom INT8 calibration without the need for a calibration data set. setDynamicRange currently supports only symmetric quantization. Furthermore, if no calibration table is provided, calibration scales must be provided for each layer.

Plugin Registry
A new searchable plugin registry, IPluginRegistry, that is a single registration point for all plugins in an application and is used to find plugin implementations during deserialization.

sampleSSD
Added a new sample called sampleSSD. This sample demonstrates how to preprocess the input to the SSD network, perform inference on the SSD network in TensorRT, use TensorRT plugins to speed up inference, and perform INT8 calibration on an SSD network.
See the TensorRT Developer Guide for details.

Breaking API Changes

  • The IPluginExt API has 4 new methods, getPluginType, getPluginVersion, destroy and clone. All plugins of type IPluginExt will have to implement these new methods and re-compile. This is a temporary issue; we expect to restore compatibility with the 4.0 API in the GA release. For more information, see Migrating Plugins From TensorRT 4.0.x To TensorRT 5.0 RC for guidance on migration.

  • Upcoming changes in TensorRT 5.0 GA for plugins
    • A new plugin class IPluginV2 and a corresponding IPluginV2 layer will be introduced. The IPluginV2 class includes similar methods to IPlugin and IPluginExt, so if your plugin implemented IPluginExt previously, you will change the class name to IPluginV2.

    • The IPluginCreator class will create and deserialize plugins of type IPluginV2 as opposed to IPluginExt.

    • The create*Plugin() methods in NvInferPlugin.h will return plugin objects of type IPluginV2 as opposed to IPluginExt.

Compatibility

  • TensorRT 5.0 RC has been tested with cuDNN 7.3.0.

  • TensorRT 5.0 RC has been tested with TensorFlow 1.9.

  • This TensorRT release supports CUDA 10.0 and CUDA 9.0. CUDA 8.0 and CUDA 9.2 are no longer supported.

Limitations In 5.0 RC

  • For this release, there are separate JetPack L4T and Drive D5L packages due to differences in the DLA library dependencies. In a future release, this should become unified.

  • Android is not supported in TensorRT 5.0 RC.

  • The Python API does not support DLA.

  • The create*Plugin functions in the NvInferPlugin.h file do not have Python bindings.

  • The choice of which DLA device to run on is currently made at build time. In GA, it will be selectable at runtime.

  • ONNX models are not supported on DLA in TensorRT 5.0 RC.

  • The included resnet_v1_152, resnet_v1_50, lenet5, and vgg19 UFF files do not support FP16 mode. This is because some of the weights fall outside the range of FP16.

Deprecated Features

The following features are deprecated in TensorRT 5.0:
  • Majority of the old Python API, including the Lite and Utils API, is deprecated. It is currently still accessible in the tensorrt.legacy package, but will be removed in a future release.

  • The following Python examples:
    • caffe_to_trt
    • pytorch_to_trt
    • tf_to_trt
    • onnx_mnist
    • uff_mnist
    • mnist_api
    • sample_onnx
    • googlenet
    • custom_layers
    • lite_examples
    • resnet_as_a_service

  • The detectionOutput Plugin has been renamed to the NMS Plugin.

  • The old ONNX parser will no longer be packaged with TensorRT; instead, use the open-source ONNX parser.

  • The DimensionTypes class.

  • The plugin APIs that return IPlugin are being deprecated and they now return IPluginExt. These APIs will be removed in a future release. Refer to the NvInferPlugin.h file inside the package.

  • nvinfer1::IPluginFactory, nvuffparser1::IPluginFactory, and nvuffparser1::IPluginFactoryExt (still available for backward compatibility). Instead, use the Plugin Registry and implement IPluginCreator for all new plugins.

  • libnvinfer.a, libnvinfer_plugin.a, and libnvparsers.a have been renamed to libnvinfer_static.a, libnvinfer_plugin_static.a, and libnvparsers_static.a respectively. This makes TensorRT consistent with CUDA, cuDNN, and other NVIDIA software libraries. It also avoids some ambiguity between dynamic and static libraries during linking.

Known Issues

  • The Plugin Registry will only register plugins with a unique {name, version} tuple. The API for this is likely to change in future versions to support multiple plugins with same name and version.

  • Only AlexNet, GoogleNet, ResNet-50, and MNIST are known to work with DLA. Other networks may work, but they have not been extensively tested.

  • The static library libnvparsers_static.a requires a special build of protobuf to complete static linking. Due to filename conflicts with the official protobuf packages, these additional libraries are only included in the tar file at this time. The two additional libraries that you will need to link against are libprotobuf.a and libprotobuf-lite.a from the tar file.

  • The ONNX static libraries libnvonnxparser_static.a and libnvonnxparser_runtime_static.a require static libraries that are missing from the package in order to complete static linking. The two static libraries that are required to complete linking are libonnx_proto.a and libnvonnxparser_plugin.a, as well as the protobuf libraries mentioned earlier. You will need to build these two missing static libraries from the open source ONNX project. This issue will be resolved in a future release.

  • If you upgrade only uff-converter-tf, for example using apt-get install uff-converter-tf, then it will not upgrade graphsurgeon-tf due to inexact dependencies between these two packages. You will need to specify both packages on the command line, such as apt-get install uff-converter-tf graphsurgeon-tf in order to upgrade both packages. This will be fixed in a future release.

  • The fc_plugin_caffe_mnist python sample cannot be executed if the sample is built using pybind11 v2.2.4. We suggest that you instead clone pybind11 v2.2.3 using the following command:
    git clone -b v2.2.3 https://github.com/pybind/pybind11.git