TensorRT Release 6.x.x

TensorRT Release 6.0.1

This is the TensorRT 6.0.1 release notes for Linux and Windows users. This release includes fixes from the previous TensorRT 5.x.x releases as well as the following additional changes. These release notes are applicable to workstation, server, and JetPack users unless appended specifically with (not applicable for Jetson platforms).

For previous TensorRT release notes, see the TensorRT Archived Documentation.

Key Features And Enhancements

This TensorRT release includes the following key features and enhancements.
  • New layers:
    IResizeLayer

    The IResizeLayer implements the resize operation on an input tensor. For more information, see IResizeLayer: TensorRT API and IResizeLayer: TensorRT Developer Guide.

    IShapeLayer

    The IShapeLayer gets the shape of a tensor. For more information, see IShapeLayer: TensorRT API and IShapeLayer: TensorRT Developer Guide.

    PointWise fusion

    Multiple adjacent pointwise layers can be fused into a single pointwise layer, to improve performance. For more information, see the TensorRT Best Practices Guide.

  • New operators:
    3-dimensional convolution

    Performs a convolution operation with 3D filters on a 5D tensor. For more information, see addConvolutionNd in the TensorRT API and IConvolutionalLayer in the TensorRT Developer Guide.

    3-dimensional deconvolution

    Performs a deconvolution operation with 3D filters on a 5D tensor. For more information, see addDeconvolutionNd in the TensorRT API and IDeconvolutionLayer in the TensorRT Developer Guide.

    3-dimensional pooling

    Performs a pooling operation with a 3D sliding window on a 5D tensor. For more information, see addPoolingNd in the TensorRT API and IPoolingLayer in the TensorRT Developer Guide.

  • New plugins:

    Added a persistent LSTM plugin; a half precision persistent LSTM plugin that supports variable sequence lengths. This plugin also supports bi-direction, setting initial hidden/cell values, storing final hidden/cell values, and multi layers. You can use it through the PluginV2 interface, achieves better performance with small batch sizes, and is currently only supported on Linux. For more information, see Persistent LSTM Plugin in the TensorRT Developer Guide. (not applicable for Jetson platforms)

  • New operations:
    TensorFlow

    Added ResizeBilinear and ResizeNearest ops.

    ONNX

    Added Resize op.

    For more information, see the full list of Supported Ops in the Support Matrix.

  • New samples:
    sampleDynamicReshape

    Added sampleDynamicReshape which demonstrates how to use dynamic input dimensions in TensorRT by creating an engine for resizing dynamically shaped inputs to the correct size for an ONNX MNIST model. For more information, see Working With Dynamic Shapes in the TensorRT Developer Guide, Digit Recognition With Dynamic Shapes in the TensorRT Samples Support Guide and the GitHub: sampleDynamicReshape directory.

    sampleReformatFreeIO

    Added sampleReformatFreeIO which uses a Caffe model that was trained on theMNIST dataset and performs engine building and inference using TensorRT. Specifically, it shows how to use reformat free I/O tensors APIs to explicitly specify I/O formats to TensorFormat::kLINEAR, TensorFormat::kCHW2 and TensorFormat::kHWC8 for Float16 and INT8 precision. For more information, see Specifying I/O Formats Using The Reformat Free I/O Tensors APIs in the TensorRT Samples Support Guide and the GitHub: sampleReformatFreeIO directory.

    sampleUffPluginV2Ext

    Added sampleUffPluginV2Ext which implements the custom pooling layer for the MNIST model (data/samples/lenet5_custom_pool.uff) and demonstrates how to extend INT8 I/O for a plugin. For more information, see Adding A Custom Layer That Supports INT8 I/O To Your Network In TensorRT in the TensorRT Samples Support Guide and the GitHub: sampleUffPluginV2Ext directory.

    sampleNMT

    Added sampleNMT which demonstrates the implementation of Neural Machine Translation (NMT) based on a TensorFlow seq2seq model using the TensorRT API. The TensorFlow seq2seq model is an open sourced NMT project that uses deep neural networks to translate text from one language to another language. For more information, see Neural Machine Translation (NMT) Using A Sequence To Sequence (seq2seq) Model in the TensorRT Samples Support Guide and Importing A Model Using The C++ API For Safety in the TensorRT Developer Guide and the GitHub: sampleNMT directory.

    sampleUffMaskRCNN

    This sample, sampleUffMaskRCNN, performs inference on the Mask R-CNN network in TensorRT. Mask R-CNN is based on the Mask R-CNN paper which performs the task of object detection and object mask predictions on a target image. This sample’s model is based on the Keras implementation of Mask R-CNN and its training framework can be found in the Mask R-CNN Github repository. For more information, see sampleUffMaskRCNN in the TensorRT Sample Support Guide. This sample is available only in GitHub: sampleUffMaskRCNN and is not packaged with the product. (not applicable for Jetson platforms)

    sampleUffFasterRCNN

    This sample, sampleUffFasterRCNN, is a UFF TensorRT sample for Faster-RCNN in NVIDIA Transfer Learning Toolkit SDK. This sample serves as a demo of how to use pretrained Faster-RCNN model in Transfer Learning Toolkit to do inference with TensorRT. For more information, see sampleUffFasterRCNN in the TensorRT Sample Support Guide. This sample is available only in GitHub: sampleUffFasterRCNN and is not packaged with the product. (not applicable for Jetson platforms)

  • New optimizations:
    Dynamic shapes

    The size of a tensor can vary at runtime. IShuffleLayer, ISliceLayer, and the new IResizeLayer now have optional inputs that can specify runtime dimensions. IShapeLayer can get the dimensions of tensors at runtime, and some layers can compute new dimensions. For more information, see Working With Dynamic Shapes and TensorRT Layers in the TensorRT Developer Guide, Digit Recognition With Dynamic Shapes in the TensorRT Samples Support Guide and the GitHub: sampleDynamicReshape directory.

    Reformat free I/O

    Network I/O tensors can be different to linear FP32. Formats of network I/O tensors now have APIs to be specified explicitly. The removal of reformatting is beneficial to many applications and specifically saves considerable memory traffic time. For more information, see Working With Reformat-Free Network I/O Tensors and Example 4: Add A Custom Layer With INT8 I/O Support Using C++ in the TensorRT Developer Guide.

    Layer optimizations

    Shuffle operations that are equivalent to identify operations on the underlying data will be omitted, if the input tensor is only used in the shuffle layer and the input and output tensors of this layer are not input and output tensors of the network. TensorRT no longer executes additional kernels or memory copies for such operations. For more information, see How Does TensorRT Work in the TensorRT Developer Guide.

    New INT8 calibrator

    MinMaxCalibrator - Preferred calibrator for NLP tasks. Supports per activation tensor scaling. Computes scales using per tensor absolute maximum value. For more information, see INT8 Calibration Using C++.

    Explicit precision

    You can manually configure a network to be an explicit precision network in TensorRT. This feature enables users to import pre-quantized models with explicit quantizing and dequantizing scale layers into TensorRT. Setting the network to be an explicit precision network implies that you will set the precision of all the network input tensors and layer output tensors in the network. TensorRT will not quantize the weights of any layer (including those running in lower precision). Instead, weights will simply be cast into the required precision. For more information about explicit precision, see Working With Explicit Precision Using C++ and Working With Explicit Precision Using Python in the TensorRT Developer Guide.

  • Installation:
    • Added support for RPM and Debian packages for PowerPC users. (not applicable for Jetson platforms)

Compatibility

Limitations

  • Upgrading TensorRT to the latest version is only supported when the currently installed TensorRT version is equal to or newer than the last two public releases. For example, TensorRT 6.x.x supports upgrading from TensorRT 5.0.x and TensorRT 5.1.x. (not applicable for Jetson platforms)

  • Calibration for a network with INT8 I/O tensors requires FP32 calibration data.

  • Shape tensors cannot be network inputs or outputs. Shape tensors can be created by IConstantLayer, IShapeLayer, or any of the following operations on shape tensors: IConcatenationLayer, IElementWiseLayer, IGatherLayer, IReduceLayer (kSUM, kMAX, kMIN, kPROD), IShuffleLayer, or ISliceLayer.

Deprecated Features

The following features are deprecated in TensorRT 6.0.1:
Samples changes
  • The PGM files for the MNIST samples have been removed. A script, called generate_pgms.py (or download_pgms.py for CUDA 10.2), has been provided in the samples/mnist/data directory to generate the images using the dataset.

  • --useDLACore=0 is no longer a valid option for sampleCharRNN as DLA does not support FP32 or RNN’s, and the sample is only written to work with FP32 in all cases.

Fixed Issues

  • Logging level Severity::kVERBOSE is now fully supported. Log messages with this level of severity are verbose messages with debugging information.

  • Deconvolution layer with stride > 32 is now supported on DLA.

  • Deconvolution layer with kernel size > 32 is now supported on DLA.

Known Issues

  • For Ubuntu 14.04 and CentOS7, in order for ONNX, TensorFlow and TensorRT to co-exist in the same environment, ONNX and TensorFlow must be built from source using your system's native compilers. It’s especially important to build ONNX and TensorFlow from source when using the IBM Anaconda channel for PowerPC to avoid compatibility issues with pybind11 and protobuf. (not applicable for Jetson platforms)

  • PointWise fusions will be disabled when the SM version is lower than 7.0 due to a performance issue. This includes all pre-Volta GPUs, for example, Pascal, Maxwell, Kepler, TX-1, TX-2, Nano.

  • TensorRT assumes that all resources for the device it is building on are available for optimization purposes. Concurrent use of multiple TensorRT builders (for example, multiple trtexec instances) to compile on different targets (DLA0, DLA1 and GPU) may oversubscribe system resources causing undefined behavior (meaning, inefficient plans, builder failure, or system instability).

    It is recommended to use trtexec with the --saveEngine argument to compile for different targets (DLA and GPU) separately and save their plan files. Such plan files can then be reused for loading (using trtexec with the --loadEngine argument) and submitting multiple inference jobs on the respective targets (DLA0, DLA1, GPU). This two step process alleviates over-subscription of system resources during the build phase while also allowing execution of the plan file to proceed without interference by the builder.

  • Windows users are currently unable to refit an engine due to some linking issues. You will encounter undefined symbols while building an application designed to use the TensorRT refittable engine feature. (not applicable for Jetson platforms)