TensorRT Release 5.x.x

TensorRT Release 5.1.5

This is the TensorRT 5.1.5 release notes for Linux and Windows users. This release includes fixes from the previous TensorRT 5.1.x releases as well as the following additional changes.

For previously released versions of TensorRT, see the TensorRT Archived Documentation.

Key Features And Enhancements

This TensorRT release includes the following key features and enhancements.
TensorRT Open Source Software (OSS)
The TensorRT GitHub repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. Included are the sources for TensorRT plugins and parsers (Caffe and ONNX) libraries, as well as sample applications demonstrating usage and capabilities of the TensorRT platform. Refer to the README.md file for prerequisites, steps for downloading, setting-up the build environment, and instructions for building the TensorRT OSS components.

For more information, see the NVIDIA Developer news article NVIDIA open sources parsers and plugins in TensorRT.

Compatibility

Deprecated Features

The following features are deprecated in TensorRT 5.1.5:
  • getDIGITS has been removed from the TensorRT package.

Known Issues

  • For Ubuntu 14.04 and CentOS7, there is a known bug when trying to import TensorRT and ONNX Python modules together due to different compiler versions used to generate their respective Python bindings. As a work around, build the ONNX module from source using your system's native compilers.
  • You may see the following warning when running programs linked with TensorRT 5.1.5 and CUDA 10.1 libraries:
    [W] [TRT] TensorRT was compiled against cuBLAS 10.2.0 but is linked against cuBLAS 10.1.0.
    You can resolve this by updating your CUDA 10.1 installation to 10.1 update 1 here.
  • There is a known issue in sample yolov3_onnx with ONNX versions > 1.4.1. To work around this, install version 1.4.1 of ONNX through:
    pip uninstall onnx; pip install onnx==1.4.1

TensorRT Release 5.1.3

This is the TensorRT 5.1.3 release notes for PowerPC users. This release includes fixes from the previous TensorRT 5.1.x releases as well as the following additional changes.

For previously released versions of TensorRT, see the TensorRT Archived Documentation.

Key Features And Enhancements

This TensorRT release includes the following key features and enhancements.
Samples
The README.md files for many samples, located within each sample source directory, have been greatly improved. We hope this makes it easier to understand the sample source code and successfully run the sample.
ONNX parser
The ONNX parser now converts GEMMs and MatMuls using the MatrixMultiply layer, and adds support for scaling the results with the alpha and beta parameters.
Asymmetric padding
  • IConvolutionLayer, IDeconvolutionLayer and IPoolingLayer directly support setting asymmetric padding. You do not need to add an explicit IPaddingLayer.
  • The new APIs are setPaddingMode(), setPrePadding() and setPostPadding(). The setPaddingMode() method takes precedence over setPaddingMode() and setPrePadding() when more than one padding method is used.
  • The Caffe, UFF, and ONNX parsers have been updated to support the new asymmetric padding APIs.
Precision optimization
TensorRT provides optimized kernels for mixed precision (FP32, FP16 and INT8) workloads on Turing GPUs, and optimizations for depthwise convolution operations. You can control the precision per-layer with the ILayer APIs.

Compatibility

  • TensorRT 5.1.3 has been tested with the following:

  • This TensorRT release supports CUDA 10.1.

  • TensorRT will now emit a warning when the major, minor, and patch versions of cuDNN and cuBLAS do not match the major, minor, and patch versions that TensorRT is expecting.

Limitations

  • For CentOS and RHEL users, when choosing Python 3:
    • Only Python version 3.6 from EPEL is supported by the RPM installation.
    • Only Python versions 3.4 and 3.6 from EPEL are supported by the tar installation.
  • In order to run the UFF converter and its related C++ and Python samples on PowerPC, it’s necessary to install TensorFlow for PowerPC. For more information, see Install TensorFlow on Power systems.
  • In order to run the PyTorch samples on PowerPC, it’s necessary to install PyTorch specifically built for PowerPC, which is not available from PyPi. For more information, see Install PyTorch on Power systems.

Deprecated Features

The following features are deprecated in TensorRT 5.1.3:
  • sampleNMT has been removed from the TensorRT package. The public data source files have changed and no longer work with the sample.

Fixed Issues

The following issues have been resolved in TensorRT 5.1.3:
  • Fixed the behavior of the Caffe crop layer when the layer has an asymmetric crop offset.
  • ITensor::getType() and ILayer::getOutputType() now report the type correctly. Previously, both types reported DataType::kFLOAT even if the output type should have been DataType::kINT32. For example, the output type of IConstantLayer with DataType::kINT32 weights is now correctly reported as DataType::kINT32. The affected layers include:
    • IConstantLayer (when weights have type DataType::kINT32)
    • IConcatentationLayer (when inputs have type DataType::kINT32)
    • IGatherLayer (when first input has type DataType::kINT32)
    • IIdentityLayer (when input has type DataType::kINT32)
    • IShuffleLayer (when input has type DataType::kINT32)
    • ISliceLayer (when input has type DataType::kINT32)
    • ITopKLayer (second output)
  • When using INT8 mode, dynamic ranges are no longer required for INT32 tensors, even if you’re not using automatic quantization.
  • Using an INT32 tensor where a floating-point tensor is expected, or vice-versa, issues an error explaining the mismatch instead of asserting failure.
  • The ONNX TensorRT parser now attempts to downcast INT64 graph weights to INT32.
  • Fixed an issue where the engine would fail to build when asymmetric padding convolutions were present in the network.

Known Issues

  • When running ShuffleNet with small batch sizes between 1 and 4, you may encounter performance regressions of up to 15% compared to TensorRT 5.0.
  • When running ResNeXt101 with a batch size of 4 using INT8 precision on a Volta GPU, you may encounter intermittent performance regressions of up to 10% compared to TensorRT 5.0. Rebuilding the engine may resolve this issue.
  • There is a known issue in sample yolov3_onnx with ONNX versions > 1.4.1. To work around this, install version 1.4.1 of ONNX through:
    pip uninstall onnx; pip install onnx==1.4.1

TensorRT Release 5.1.2 Release Candidate (RC)

This is the release candidate (RC) for TensorRT 5.1.2 and is applicable to Linux and Windows users. This RC includes several enhancements and improvements compared to the previously released TensorRT 5.0.2.

This preview release is for early testing and feedback, therefore, for production use of TensorRT, continue to use TensorRT 5.0.2.

For previously released versions of TensorRT, see the TensorRT Documentation Archives.

Key Features And Enhancements

This TensorRT release includes the following key features and enhancements.
Improved performance of HMMA and IMMA convolution
The performance of Convolution, including Depthwise Separable Convolution and Group Convolution has improved in FP16 and INT8 modes on Volta and Turing. For example: ResNeXt-101 batch=1 INT8 3x speedup on Tesla T4.
Reload weights for an existing TensorRT engine
Engines can be refitted with new weights. For more information, see Refitting An Engine.
New supported operations
Caffe: Added BNLL, Clip and ELU ops. Additionally, the leaky ReLU option for the ReLU op (negative_slope != 0) was added.

UFF: Added ArgMax, ArgMin, Clip, Elu, ExpandDims, Identity, LeakyReLU, Recip, Relu6, Sin, Cos, Tan, Asin, Acos, Atan, Sinh, Cosh, Asinh, Acosh, Atanh, Ceil, Floor, Selu, Slice, Softplus and Softsign ops.

ONNX: Added ArgMax, ArgMin, Clip, Cast, Elu, Selu, HardSigmoid, Softplus, Gather, ImageScaler, LeakyReLU, ParametricSoftplus, Sin, Cos, Tan, Asin, Acos, Atan, Sinh, Cosh, Asinh, Acosh, Atanh, Ceil, Floor, ScaledTanh, Softsign, Slice, ThresholdedRelu and Unsqueeze ops.

For more information, see the TensorRT Support Matrix.

NVTX support
NVIDIA Tools Extension SDK (NVTX) is a C-based API for marking events and ranges in your applications. NVTX annotations were added in TensorRT to help correlate the runtime engine layer execution with CUDA kernel calls. NVIDIA Nsight Systems supports collecting and visualizing these events and ranges on the timeline. NVIDIA Nsight Compute also supports collecting and displaying the state of all active NVTX domains and ranges in a given thread when the application is suspended.
New layer
Added support for the Slice layer. The Slice layer implements a slice operator for tensors. For more information, see ISliceLayer.
RNNs
Changed RNNv1 and RNNv2 validation of hidden and cell input/output dimensions. This affects only bidirectional RNNs.
EntropyCalibrator2
Added Entropy Calibration algorithm; which is the preferred calibrator.
Python support
Python 3 is now supported for CentOS and RHEL users. The Python 3 wheel files have been split so that each wheel file now contains the Python bindings for only one Python version and follows pip naming conventions.
New Python samples
  • INT8 Calibration In Python - This sample demonstrates how to create an INT8 calibrator, build and calibrate an engine for INT8 mode, and finally run inference in INT8 mode.
  • Engine Refit In Python - This sample demonstrates the engine refit functionality provided by TensorRT. The model first trains an MNIST model in PyTorch, then recreates the network in TensorRT.
For more information, see the Samples Support Guide.
NVIDIA Machine Learning network repository installation
TensorRT 5.1 can now be directly installed from the NVIDIA Machine Learning network repository when only the C++ libraries and headers are required. The intermediate step of downloading and installing a local repo from the network repo is no longer required. This simplifies the number of steps required to automate the TensorRT installation. See the TensorRT Installation Guide for more information.

Breaking API Changes

  • A kVERBOSE logging level was added in TensorRT 5.1, however, due to ABI implications, kVERBOSE is not currently being used. Messages at the kVERBOSE logging level may be emitted in a future release.

Compatibility

Limitations

  • A few optimizations are disabled when building refittable engines:
    • IScaleLayer operations that have non-zero count of weights for shift or scale and are mathematically the identity function will not be removed, since a refit of the shift or scale weights could make it a non-identity function. IScaleLayer operations where the shift and scale weights have zero count are still removed if the power weights are unity.
    • Optimizations for multilayer perceptrons are disabled. These optimizations target serial compositions of IFullyConnectedLayer, IMatrixMultiplyLayer, and IActivationLayer.

Deprecated Features

The following features are deprecated in TensorRT 5.1.2 RC:
  • The UFF Parser which is used to parse a network in UFF format will be deprecated in a future release. The recommended method of importing TensorFlow models to TensorRT is using TensorFlow with TensorRT (TF-TRT). For step-by-step instructions on how to accelerate inference in TF-TRT, see the TF-TRT User Guide and Release Notes. For source code from GitHub, see Examples for TensorRT in TensorFlow (TF-TRT).

  • Deprecated --engine=<filename> option in trtexec. Use --saveEngine=<filename> and --loadEngine=<filename> instead for clarity.

Known Issues

  • Using the current public data sources, sampleNMT produces incorrect results which results in a low BLEU score. This sample will be removed in the next release so that we can update the source code to work with the latest public data.

  • There is a known multilayer perceptron (MLP) performance regression in TensorRT 5.1.2 compared to TensorRT 5.0. During the engine build phase the GPU cache state may lead to different tactic selections on Turing. The magnitude of the regression depends on the batch size and the depth of the network.

  • On sampleSSD and sampleUffSSD during INT8 calibration, you may encounter a file read error in TensorRT-5.1.x.x/data/samples/ssd/VOC2007/list.txt. This is due to line-ending differences on Windows vs Linux. To workaround this problem, open list.txt in a text editor and ensure that the file is using Unix-style line endings.

  • Python sample yolov3_onnx is functional only for ONNX versions greater than 1.1.0 and less than 1.4.0.

TensorRT Release 5.1.1 Release Candidate (RC)

This is the release candidate (RC) for TensorRT 5.1.1 and is applicable to automotive users on PDK version 5.1.3. This RC includes several enhancements and improvements compared to the previously released TensorRT 5.0.3.

This preview release is for early testing and feedback, therefore, for production use of TensorRT, continue to use TensorRT 5.0.3.

For previously released versions of TensorRT, see the TensorRT Documentation Archives.

Key Features And Enhancements

This TensorRT release includes the following key features and enhancements.

Breaking API Changes

  • A kVERBOSE logging level was added in TensorRT 5.1.0, however, due to ABI implications, kVERBOSE is no longer being used in TensorRT 5.1.1. It may be used again in a future release.

Compatibility

  • TensorRT 5.1.1 RC has been tested with the following:

  • This TensorRT release supports CUDA 10.1.

Limitations

  • The Python API is not included in this package.

Known Issues

  • When linking against CUDA 10.1, performance regressions may occur under Drive 5.0 QNX and Drive 5.0 Linux because of a regression in cuBLAS. This affects the FullyConnected layers in AlexNet, VGG19, and ResNet-50 for small batch sizes (between 1 and 4).

  • Performance regressions of around 10% may be seen when using group convolutions caused by a CUDA mobile driver bug. These regressions might be seen in networks such as ResNext and ShuffleNet.

TensorRT Release 5.1.0 Release Candidate (RC)

This is the release candidate (RC) for TensorRT 5.1.0. It includes several enhancements and improvements compared to the previously released TensorRT 5.0.x. This preview release is for early testing and feedback, therefore, for production use of TensorRT, continue to use TensorRT 5.0.2.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Improved performance of HMMA and IMMA convolution
The performance of Convolution, including Depthwise Separable Convolution and Group Convolution has improved in FP16 and INT8 modes on Volta, Xavier and Turing. For example:
  • ResNet50 INT8 batch=8 1.2x speedup on Jetson AGX Xavier
  • MobileNetV2 FP16 batch=8 1.2x speedup on Jetson AGX Xavier
  • ResNeXt-101 batch=1 INT8 3x speedup on Tesla T4

Reload weights for an existing TensorRT engine
Engines can be refitted with new weights. For more information, see Refitting An Engine.
DLA with INT8
Added support for running the AlexNet network on DLA using trtexec in INT8 mode. For more information, see Working With DLA.

New supported operations
Caffe: Added BNLL, Clip and ELU ops. Additionally, the leaky ReLU option for the ReLU op (negative_slope != 0) was added.

UFF: Added ArgMax, ArgMin, Clip, Elu, ExpandDims, Identity, LeakyReLU, Recip, Relu6, Sin, Cos, Tan, Asin, Acos, Atan, Sinh, Cosh, Asinh, Acosh, Atanh, Ceil, Floor, Selu, Slice, Softplus and Softsign ops.

ONNX: Added ArgMax, ArgMin, Clip, Cast, Elu, Selu, HardSigmoid, Softplus, Gather, ImageScaler, LeakyReLU, ParametricSoftplus, Sin, Cos, Tan, Asin, Acos, Atan, Sinh, Cosh, Asinh, Acosh, Atanh, Ceil, Floor, ScaledTanh, Softsign, Slice, ThresholdedRelu and Unsqueeze ops.

For more information, see the TensorRT Support Matrix.

NVTX support
NVIDIA Tools Extension SDK (NVTX) is a C-based API for marking events and ranges in your applications. NVTX annotations were added in TensorRT to help correlate the runtime engine layer execution with CUDA kernel calls. NVIDIA Nsight Systems supports collecting and visualizing these events and ranges on the timeline. NVIDIA Nsight Compute also supports collecting and displaying the state of all active NVTX domains and ranges in a given thread when the application is suspended.

New layer
Added support for the Slice layer. The Slice layer implements a slice operator for tensors. For more information, see ISliceLayer.
RNNs
Changed RNNv1 and RNNv2 validation of hidden and cell input/output dimensions. This affects only bidirectional RNNs.

EntropyCalibrator2
Added Entropy Calibration algorithm; which is the preferred calibrator. This is also the required calibrator for DLA INT8 because it supports per activation tensor scaling.

ILogger
Added verbose severity level in ILogger for emitting debugging messages. Some messages that were previously logged with severity level kINFO are now logged with severity level kVERBOSE. Added new ILogger derived class in samples and trtexec. Most messages should be categorized (using the severity level) as:
[V]
For verbose debug informational messages.
[I]
For "instructional" informational messages.
[W]
For warning messages.
[E]
For error messages.
[F]
For fatal error messages.

Python
  • INT8 Calibration In Python - This sample demonstrates how to create an INT8 calibrator, build and calibrate an engine for INT8 mode, and finally run inference in INT8 mode.
  • Engine Refit In Python - This sample demonstrates the engine refit functionality provided by TensorRT. The model first trains an MNIST model in PyTorch, then recreates the network in TensorRT.
For more information, see the Samples Support Guide.

Python bindings
Added Python bindings to the aarch64-gnu release package (debian and tar).

RPM installation
Provided installation support for Red Hat Enterprise Linux (RHEL) and CentOS users to upgrade from TensorRT 5.0.x to TensorRT 5.1.x. For more information, see the upgrading instructions in the Installation Guide.

Breaking API Changes

  • A new logging level, kVERBOSE, was added in TensorRT 5.1.0. Messages are being emitted by the TensorRT builder and/or engine using this new logging level. Since the logging level did not exist in TensorRT 5.0.x, some applications might not handle the new logging level properly and in some cases the application may crash. In the next release, more descriptive messages will appear when using the kINFO logging level because the kVERBOSE messages will be produced using kINFO. However, the kVERBOSE logging level will remain in the API and kVERBOSE messages may be emitted in a future TensorRT release.

Compatibility

  • TensorRT 5.1.0 RC has been tested with cuDNN 7.3.1.

  • TensorRT 5.1.0 RC has been tested with TensorFlow 1.12.0.

  • TensorRT 5.1.0 RC has been tested with PyTorch 1.0.

  • This TensorRT release supports CUDA 10.0.

Limitations

  • A few optimizations are disabled when building refittable engines.
    • IScaleLayer operations that have non-zero count of weights for shift or scale and are mathematically the identity function will not be removed, since a refit of the shift or scale weights could make it a non-identity function. IScaleLayer operations where the shift and scale weights have zero count are still removed if the power weights are unity.
    • Optimizations for multilayer perceptrons are disabled. These optimizations target serial compositions of IFullyConnectedLayer, IMatrixMultiplyLayer, and IActivationLayer.

  • DLA limitations
    • FP16 LRN is supported with the following parameters:
      • local_size = 5
      • alpha = 0.0001
      • beta = 0.75
    • INT8 LRN, Sigmoid, and Tanh are not supported.
    For more information, see DLA Supported Layers.

Deprecated Features

The following features are deprecated in TensorRT 5.1.0 RC:
  • Deprecated --engine=<filename> option in trtexec. Use --saveEngine=<filename> and --loadEngine=<filename> instead for clarity.

Known Issues

  • When the tensor size is too large, such as a single tensor that has more than 4G elements, overflow may occur which will cause TensorRT to crash. As a workaround, you may need to reduce the batch size.

TensorRT Release 5.0.6

This is the release for TensorRT 5.0.6 and is applicable to JetPack 4.2.0 users. This release includes several enhancements and improvements compared to the previously released TensorRT Release 5.0.5.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements for JetPack users.
  • Python support for AArch64 Linux is included as an early access release. All features are expected to be available, however, some aspects of functionality and performance will likely be limited compared to a non-EA release.

  • The UFF parser’s memory usage was significantly reduced to better accommodate boards with small amounts of memory.

Compatibility

Known Issues

  • The default workspace size for sampleUffSSD is 1 GB. This may be too large for the Jetson TX1 NANO, therefore, change the workspace for the builder in the source file via the following code:
    builder->setMaxWorkspaceSize(16_MB);

  • In order to run larger networks or larger batch sizes with TensorRT, it may be necessary to free memory on the board. This can be accomplished by running in headless mode or killing processes with high memory consumption.

  • Due to limited system memory on the Jetson TX1 NANO, which is shared between the CPU and GPU, you may not be able run some samples, for example, sampleFasterRCNN.

  • Python sample yolov3_onnx is functional only for ONNX versions greater than 1.1.0 and less than 1.4.0.

TensorRT Release 5.0.5

This is the TensorRT 5.0.5 release notes for Android users. This release includes fixes from the previous TensorRT 5.0.x releases as well as the following additional fixes. For previous TensorRT 5.0.x release notes, see TensorRT Release Notes.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements for Android users.
  • TensorRT 5.0.5 has two sub-releases:
    • TensorRT 5.0.5.0 (without DLA support)
    • TensorRT 5.0.5.1 (with DLA support)

Compatibility

  • TensorRT 5.0.5 supports CUDA 10.0
  • TensorRT 5.0.5 supports cuDNN 7.3.1
  • TensorRT 5.0.5 supports the Android platform with API level 26 or higher

Limitations In 5.0.5

  • TensorRT 5.0.5.1 supports DLA while TensorRT 5.0.5.0 does not.

Known Issues

  • For TensorRT 5.0.5.0, some sample programs have --useDLACore in their command line arguments, however, do not use it because this release does not support DLA.

  • When running trtexec from a saved engine, the --output and --input command line arguments are mandatory. For example:
    ./trtexec --onnx=data/mnist/mnist.onnx --fp16 --engine=./mnist_onnx_fp16.engine
    ./trtexec --engine=./mnist_onnx_fp16.engine --input=Input3 --output=Plus214_Output_0
    

  • When running applications that use DLA on Xavier based platforms that also contain a discrete GPU (dGPU), you may be required to select the integrated GPU (iGPU). This can be done using the following command:
    export CUDA_VISIBLE_DEVICES=1

TensorRT Release 5.0.4

This is the TensorRT 5.0.4 release notes for Windows users. This release includes fixes from the previous TensorRT 5.0.x releases as well as the following additional fixes. For previous TensorRT 5.0.x release notes, see TensorRT Release Notes.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements for the Windows platform.
  • ONNX model parsing support has been added.

  • Two new samples showcasing ONNX model parsing functionality have been added:
    • sampleOnnxMNIST
    • sampleINT8API

  • CUDA 9.0 support has been added.

Compatibility

  • TensorRT 5.0.4 supports Windows 10
  • TensorRT 5.0.4 supports CUDA 10.0 and CUDA 9.0
  • TensorRT 5.0.4 supports CUDNN 7.3.1
  • TensorRT 5.0.4 supports Visual Studio 2017

Limitations In 5.0.4

  • TensorRT 5.0.4 does not support Python API on Windows.

Known Issues

  • NVIDIA’s Windows display driver sets timeout detection recovery to 2 seconds by default. This can cause some timeouts within TensorRT’s builder and cause crashes. For more information, see Timeout Detection & Recovery (TDR) to increase the default timeout threshold if you encounter this problem.

  • TensorRT Windows performance is slower than Linux due to the operating system and driver differences. There are two driver modes:
    • WDDM (around 15% slower than Linux)
    • TCC (around 10% slower than Linux.) TCC mode is generally not supported for GeForce GPUs, however, we recommend it for Quadro or Tesla GPUs. Detailed instructions on setting TCC mode can be found here: Tesla Compute Cluster (TCC).

    • Volta FP16 performance on CUDA 9.0 may be up to 2x slower than on CUDA 10.0. We expect to mitigate this issue in a future release.

    • Most README files that are included with the samples assume that you are working on a Linux workstation. If you are using Windows and do not have access to a Linux system with an NVIDIA GPU, then you can try using VirtualBox to create a virtual machine based on Ubuntu. You may also want to consider using a Docker container for Ubuntu. Many samples do not require any training, therefore the CPU versions of TensorFlow and PyTorch are enough to complete the samples.

    • For sample_ssd and sample_uff_ssd, the INT8 calibration script is not supported natively on Windows. You can generate the INT8 batches on a Linux machine and copy them over in order to run sample_ssd in INT8 mode.

    • For sample_uff_ssd, the Python script convert-to-uff is not packaged within the .zip. You can generate the required .uff file on a Linux machine and copy it over in order to run sample_uff_ssd. During INT8 calibration, you may encounter a file reading error in TensorRT/data/samples/ssd/VOC2007/list.txt. This is due to line-ending differences on Windows. To work around this, open list.txt in a text editor and ensure that the file is using Unix-style line endings.

    • For sample_int8_api,the legacy runtime option is not supported on Windows.

    • When issuing -h for sampleINT8API, the --write_tensors option is missing. The --write_tensors option generates a file that contains a list of network tensor names. By default, it writes to the network_tensors.txt file. For information about additional options, issue --tensors.

TensorRT Release 5.0.3

This is the TensorRT 5.0.3 release notes for Automotive and L4T users. This release includes fixes from the previous TensorRT 5.0.x releases as well as the following additional fixes. For previous TensorRT 5.0.x release notes, see TensorRT Release Notes.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.
  • For this TensorRT release, JetPack L4T and Drive D5L are supported by a single package.

See the TensorRT Developer Guide for details.

Compatibility

TensorRT 5.0.3 supports the following product versions:
  • CUDA 10.0
  • cuDNN 7.3.1
  • NvMedia DLA version 2.2
  • NvMedia VPI Version 2.3

Known Issues

  • For multi-process execution, and specifically when executing multiple inference sessions in parallel (for example, of trtexec) target different accelerators, you may observe a performance degradation if cudaEventBlockingSync is used for stream synchronization.
    One way to work around this performance degradation is to use the cudaEventDefault flag when creating the events which internally uses the spin-wait synchronization mechanism. In trtexec, the default behavior is to use blocking events, but this can be overridden with the --useSpinWait option to specify spin-wait based synchronization.
    Note: The spin-wait mechanism can increase CPU utilization on the system.

    For more information about CUDA blocking sync semantics, refer to Event Management.

  • There is a known issue when attempting to cross compile samples for mobile platforms on an x86_64 host machine. As cross-platform CUDA packages are structured differently, the following changes are required for samples/Makefile.config when compiling cross platform.
    Line 80
    Add:
    -L"$(CUDA_INSTALL_DIR)/targets/$(TRIPLE)/$(CUDA_LIBDIR)/stubs"
    Line 109
    Remove:
    -lnvToolsExt

TensorRT Release 5.0.2

This is the TensorRT 5.0.2 release notes for Desktop users. This release includes fixes from the previous TensorRT 5.0.x releases as well as the following additional fixes. For previous TensorRT 5.0.x release notes, see TensorRT Release Notes.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Platforms
Added support for CentOS 7.5, Ubuntu 18.04, and Windows 10.

Turing
You must use CUDA 10.0 or later if you are using a Turing GPU.

DLA (Deep Learning Accelerator)
The layers supported by DLA are Activation, Concatenation, Convolution, Deconvolution, ElementWise, FullyConnected, LRN, Pooling, and Scale. For layer specific constraints, see DLA Supported Layers. AlexNet, GoogleNet, ResNet-50, and LeNet for MNIST networks have been validated on DLA. Since DLA support is new to this release, it is possible that other CNN networks that have not been validated will not work. Report any failing CNN networks that satisfy the layer constraints by submitting a bug via the NVIDIA Developer website. Ensure you log-in, click on your name in the upper right corner, click My account > My Bugs and select Submit a New Bug.
The trtexec tool can be used to run on DLA with the --useDLACore=N where N is 0 or 1, and --fp16 options. To run the MNIST network on DLA using trtexec, issue:
 ./trtexec --deploy=data/mnist/mnist.prototxt --output=prob --useDLACore=0 --fp16 --allowGPUFallback

trtexec does not support ONNX models on DLA.

Redesigned Python API
The Python API has gone through a thorough redesign to bring the API up to modern Python standards. This fixed multiple issues, including making it possible to support serialization via the Python API. Python samples using the new API include parser samples for ResNet-50, a Network API sample for MNIST, a plugin sample using Caffe, and an end-to-end sample using TensorFlow.

INT8
Support has been added for user-defined INT8 scales, using the new ITensor::setDynamicRange function. This makes it possible to define dynamic range for INT8 tensors without the need for a calibration data set. setDynamicRange currently supports only symmetric quantization. A user must either supply a dynamic range for each tensor or use the calibrator interface to take advantage of INT8 support.

Plugin Registry
A new searchable plugin registry, IPluginRegistry, is a single registration point for all plugins in an application and is used to find plugin implementations during deserialization.

C++ Samples
sampleSSD
This sample demonstrates how to perform inference on the Caffe SSD network in TensorRT, use TensorRT plugins to speed up inference, and perform INT8 calibration on an SSD network. To generate the required prototxt file for this sample, perform the following steps:
  1. Download models_VGGNet_VOC0712_SSD_300x300.tar.gz from: https://drive.google.com/file/d/0BzKzrI_SkD1_WVVTSmQxU0dVRzA/view
  2. Extract the contents of the tar file;
    tar xvf
        ~/Downloads/models_VGGNet_VOC0712_SSD_300x300.tar.gz
  3. Edit the deploy.prototxt file and change all the Flatten layers to Reshape operations with the following parameters:
    reshape_param {
        shape {
          dim: 0
          dim: -1
          dim: 1
          dim: 1
        }
  4. Update the detection_out layer by adding the keep_count output, for example, add:
    top: "keep_count"
  5. Rename the deploy.prototxt file to ssd.prototxt and run the sample.
  6. To run the sample in INT8 mode, install Pillow first by issuing the $ pip install Pillow command, then follow the instructions from the README.
sampleINT8API
This sample demonstrates how to perform INT8 Inference using per-tensor dynamic range. To generate the required input data files for this sample, perform the following steps:
Running the sample:
  1. Download the Model files from GitHub, for example:
    wget https://s3.amazonaws.com/download.onnx/models/opset_3/resnet50.tar.gz
    
  2. Unzip the tar file:
    tar -xvzf resnet50.tar.gz
  3. Rename resnet50/model.onnx to resnet50/resnet50.onnx, then copy the resnet50.onnx file to the data/int8_api directory.
  4. Run the sample:
    ./sample_int8_api [-v or --verbose]
Running the sample with a custom configuration:
  1. Download the Model files from GitHub.
  2. Create an input image with a PPM extension. Resize it with the dimensions of 224x224x3.
  3. Create a file called reference_labels.txt. Ensure each line corresponds to a single imagenet label. You can download the imagenet 1000 class human readable labels from here. The reference label file contains only a single label name per line, for example, 0:'tench, Tinca tinca' is represented as tench.
  4. Create a file called dynamic_ranges.txt. Ensure each line corresponds to the tensor name and floating point dynamic range, for example <tensor_name> : <float dynamic range>. In order to generate tensor names, iterate over the network and generate the tensor names. The dynamic range can either be obtained from training (by measuring the min/max value of activation tensors in each epoch) or using custom post processing techniques (similar to TensorRT calibration). You can also choose to use a dummy per tensor dynamic range to run the sample.

Python Samples
yolov3_onnx
This sample demonstrates a full ONNX-based pipeline for inference with the network YOLOv3-608, including pre- and post-processing.
uff_ssd
This sample demonstrates a full UFF-based inference pipeline for performing inference with an SSD (InceptionV2 feature extractor) network.

IPluginV2
A plugin class IPluginV2 has been added together with a corresponding IPluginV2 layer. The IPluginV2 class includes similar methods to IPlugin and IPluginExt, so if your plugin implemented IPluginExt previously, you will change the class name to IPluginV2. The IPlugin and IPluginExt interfaces are to be deprecated in the future, therefore, moving to the IPluginV2 interface for this release is strongly recommended.

See the TensorRT Developer Guide for details.

Breaking API Changes

  • The choice of which DLA core to run a layer on is now made at runtime. You can select the device type at build time, using the following methods:
    IBuilder::setDeviceType(ILayer* layer, DeviceType deviceType)
    IBuilder::setDefaultDeviceType(DeviceType deviceType)
    
    where DeviceType is:
    {
        kGPU,  //!< GPU Device
        kDLA,  //!< DLA Core
    };
    
    The specific DLA core to execute the engine on can be set by the following methods:
    IBuilder::setDLACore(int dlaCore)
    IRuntime::setDLACore(int dlaCore)
    
    The following methods have been added to get the DLA core set on IBuilder or IRuntime objects:
    int IBuilder::getDLACore()
    int IRuntime::getDLACore()
    
    Another API has been added to query the number of accessible DLA cores as follows:
    int IBuilder::getNbDLACores()
    Int IRuntime::getNbDLACores()
    

  • The --useDLA=<int> on trtexec tool has been changed to --useDLACore=<int>, the value can range from 0 to N-1, N being the number of DLA cores. Similarly, to run any sample on DLA, use --useDLACore=<int> instead of --useDLA=<int>.

Compatibility

  • TensorRT 5.0.2 has been tested with cuDNN 7.3.1.

  • TensorRT 5.0.2 has been tested with TensorFlow 1.9.

  • This TensorRT release supports CUDA 10.0 and CUDA 9.0. CUDA 8.0 and CUDA 9.2 are no longer supported. On Windows only, CUDA 10.0 is supported for TensorRT 5.0.1 RC.

Limitations In 5.0.2

  • TensorRT 5.0.2 does not include support for DLA with the INT8 data type. Only DLA with the FP16 data type is supported by TensorRT at this time. DLA with INT8 support is planned for a future TensorRT release.

  • Android is not supported in TensorRT 5.0.2.

  • The Python API is only supported on x86-based Linux platforms.

  • The create*Plugin functions in the NvInferPlugin.h file do not have Python bindings.

  • ONNX models are not supported on DLA in TensorRT 5.0.2.

  • The included resnet_v1_152, resnet_v1_50, lenet5, and vgg19 UFF files do not support FP16 mode. This is because some of the weights fall outside the range of FP16.

  • The ONNX parser is not supported on Windows 10. This includes all samples which depend on the ONNX parser. ONNX support will be added in a future release.

  • Tensor Cores supporting INT4 were first introduced with Turing GPUs. This release of TensorRT 5.0 does not support INT4.

  • The yolov3_onnx Python sample is not supported on Ubuntu 14.04 and earlier.

  • The uff_ssd sample requires tensorflow-gpu for performing validation only. Other parts of the sample can use the CPU version of tensorflow.

  • The Leaky ReLU plugin (LReLU_TRT) allows for only a parameterized slope on a per tensor basis.

Deprecated Features

The following features are deprecated in TensorRT 5.0.2:
  • The majority of the old Python API, including the Lite and Utils API, are deprecated. It is currently still accessible in the tensorrt.legacy package, but will be removed in a future release.

  • The following Python examples are deprecated:
    • caffe_to_trt
    • pytorch_to_trt
    • tf_to_trt
    • onnx_mnist
    • uff_mnist
    • mnist_api
    • sample_onnx
    • googlenet
    • custom_layers
    • lite_examples
    • resnet_as_a_service

  • The detectionOutput Plugin has been renamed to the NMS Plugin.

  • The old ONNX parser will no longer be packaged with TensorRT; instead, use the open-source ONNX parser.

  • The DimensionTypes class is deprecated.

  • The plugin APIs that return INvPlugin are being deprecated and they now return IPluginV2. These APIs will be removed in a future release. Refer to NvInferPlugin.h inside the TensorRT package.

  • The nvinfer1::IPluginFactory, nvuffparser1::IPluginFactory, and nvuffparser1::IPluginFactoryExt plugins are still available for backward compatibility. However, it is still recommended to use the Plugin Registry and implement IPluginCreator for all new plugins.

  • The libnvinfer.a, libnvinfer_plugin.a, and libnvparsers.a libraries have been renamed to libnvinfer_static.a, libnvinfer_plugin_static.a, and libnvparsers_static.a respectively. This makes TensorRT consistent with CUDA, cuDNN, and other NVIDIA software libraries. It also avoids some ambiguity between dynamic and static libraries during linking.

Known Issues

  • Only AlexNet, GoogleNet, ResNet-50, and MNIST are known to work with DLA. Other networks may work, but they have not been extensively tested.

  • For this TensorRT release, there are separate JetPack L4T and Drive D5L packages due to differences in the DLA library dependencies. In a future release, this should become unified.

  • The static library libnvparsers_static.a requires a special build of protobuf to complete static linking. Due to filename conflicts with the official protobuf packages, these additional libraries are only included in the tar file at this time. The two additional libraries that you will need to link against are libprotobuf.a and libprotobuf-lite.a from the tar file.

  • The ONNX static libraries libnvonnxparser_static.a and libnvonnxparser_runtime_static.a require static libraries that are missing from the package in order to complete static linking. The two static libraries that are required to complete linking are libonnx_proto.a and libnvonnxparser_plugin.a, as well as the protobuf libraries mentioned earlier. You will need to build these two missing static libraries from the open source ONNX project. This issue will be resolved in a future release.

  • The C++ API documentation is not included in the TensorRT zip file. Refer to the online documentation if you want to view the TensorRT C++ API.

  • Most README files that are included with the samples assume that you are working on a Linux workstation. If you are using Windows and do not have access to a Linux system with an NVIDIA GPU, then you can try using VirtualBox to create a virtual machine based on Ubuntu. Many samples do not require any training, therefore the CPU versions of TensorFlow and PyTorch are enough to complete the samples.

  • The TensorRT Developer Guide has been written with Linux users in mind. Windows specific instructions, where possible, will be added in a future revision of the document.

  • If sampleMovieLensMPS crashes before completing execution, an artifact (/dev/shm/sem.engine_built) will not be properly destroyed. If the sample complains about being unable to create a semaphore, remove the artifact by running rm /dev/shm/sem.engine_built.

  • To create a valid UFF file for sampleMovieLensMPS, the correct command is:
    python convert_to_uff.py sampleMovieLens.pb -p preprocess.py
    where preprocess.py is a script that is shipped with sampleMovieLens. Do not use the command specified by the README.

  • The trtexec tool does not currently validate command-line arguments. If you encounter failures, double check the command-line parameters that you provided.

TensorRT Release 5.0.1 Release Candidate (RC)

This is the release candidate (RC) for TensorRT 5.0.1 release notes. This release is for Windows users only. It includes several enhancements and improvements compared to the previously released TensorRT 4.0.1. This preview release is for early testing and feedback, therefore, for production use of TensorRT, continue to use TensorRT 4.0.1.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Platforms
Added support for CentOS 7.5, Ubuntu 18.04, and Windows 10.

Turing
You must use CUDA 10.0 or later if you are using a Turing GPU.

DLA (Deep Learning Accelerator)
The layers supported by DLA are Activation, Concatenation, Convolution, Deconvolution, ElementWise, FullyConnected, LRN, Pooling, and Scale. For layer specific constraints, see DLA Supported Layers. Networks such as AlexNet, GoogleNet, ResNet-50, and MNIST work with DLA. Other CNN networks may work, but they have not been extensively tested and may result in failures including segfaults.
The trtexec tool can be used to run on DLA with the --useDLA=N and --fp16 options. To run the AlexNet network on DLA using trtexec, issue:
 ./trtexec --deploy=data/AlexNet/AlexNet_N2.prototxt --output=prob --useDLA=1 --fp16 --allowGPUFallback

trtexec does not support ONNX models to run on DLA.

Redesigned Python API
The Python API has been rewritten from scratch and includes various improvements. In addition to several bug fixes, it is now possible to serialize and deserialize an engine to and from a file using the Python API. Python samples using the new API include parser samples for ResNet-50, a Network API sample for MNIST, a plugin sample using Caffe, and an end-to-end sample using TensorFlow.

INT8
Support for user-defined INT8 scales, using the new ITensor::setDynamicRange function. This makes it possible to provide custom INT8 calibration without the need for a calibration data set. setDynamicRange currently supports only symmetric quantization. Furthermore, if no calibration table is provided, calibration scales must be provided for each layer.

Plugin Registry
A new searchable plugin registry, IPluginRegistry, that is a single registration point for all plugins in an application and is used to find plugin implementations during deserialization.

sampleSSD
This sample demonstrates how to preprocess the input to the SSD network, perform inference on the SSD network in TensorRT, use TensorRT plugins to speed up inference, and perform INT8 calibration on an SSD network.
See the TensorRT Developer Guide for details.

Breaking API Changes

  • The IPluginExt API has 4 new methods, getPluginType, getPluginVersion, destroy and clone. All plugins of type IPluginExt will have to implement these new methods and re-compile. This is a temporary issue; we expect to restore compatibility with the 4.0 API in the GA release. For more information, see Migrating Plugins From TensorRT 5.0.0 RC To TensorRT 5.0.x for guidance on migration.

Compatibility

  • TensorRT 5.0.1 RC has been tested with cuDNN 7.3.0.

  • TensorRT 5.0.1 RC has been tested with TensorFlow 1.9.

  • TensorRT 5.0.1 RC for Windows has been tested with Visual Studio 2017.

  • This TensorRT release supports CUDA 10.0 and CUDA 9.0. CUDA 8.0 and CUDA 9.2 are no longer supported. On Windows only, CUDA 10.0 is supported for TensorRT 5.0.1 RC.

Limitations In 5.0.1 RC

  • For this release, there are separate JetPack L4T and Drive D5L packages due to differences in the DLA library dependencies. In a future release, this should become unified.

  • Android is not supported in TensorRT 5.0.1 RC.

  • The Python API does not support DLA.

  • The create*Plugin functions in the NvInferPlugin.h file do not have Python bindings.

  • The choice of which DLA device to run on is currently made at build time. In GA, it will be selectable at runtime.

  • ONNX models are not supported on DLA in TensorRT 5.0.1 RC.

  • The included resnet_v1_152, resnet_v1_50, lenet5, and vgg19 UFF files do not support FP16 mode. This is because some of the weights fall outside the range of FP16.

  • Python is not supported on Windows 10. This includes the graphsurgeon and UFF Python modules.

  • The ONNX parser is not supported on Windows 10. This includes all samples which depend on the ONNX parser. ONNX support will be added in a future release.

Deprecated Features

The following features are deprecated in TensorRT 5.0.1 RC:
  • Majority of the old Python API, including the Lite and Utils API, is deprecated. It is currently still accessible in the tensorrt.legacy package, but will be removed in a future release.

  • The following Python examples:
    • caffe_to_trt
    • pytorch_to_trt
    • tf_to_trt
    • onnx_mnist
    • uff_mnist
    • mnist_api
    • sample_onnx
    • googlenet
    • custom_layers
    • lite_examples
    • resnet_as_a_service

  • The detectionOutput Plugin has been renamed to the NMS Plugin.

  • The old ONNX parser will no longer be packaged with TensorRT; instead, use the open-source ONNX parser.

  • The DimensionTypes class.

  • The plugin APIs that return IPlugin are being deprecated and they now return IPluginExt. These APIs will be removed in a future release. Refer to the NvInferPlugin.h file inside the package.

  • nvinfer1::IPluginFactory, nvuffparser1::IPluginFactory, and nvuffparser1::IPluginFactoryExt (still available for backward compatibility). Instead, use the Plugin Registry and implement IPluginCreator for all new plugins.

  • libnvinfer.a, libnvinfer_plugin.a, and libnvparsers.a have been renamed to libnvinfer_static.a, libnvinfer_plugin_static.a, and libnvparsers_static.a respectively. This makes TensorRT consistent with CUDA, cuDNN, and other NVIDIA software libraries. It also avoids some ambiguity between dynamic and static libraries during linking.

Known Issues

  • The Plugin Registry will only register plugins with a unique {name, version} tuple. The API for this is likely to change in future versions to support multiple plugins with same name and version.

  • Only AlexNet, GoogleNet, ResNet-50, and MNIST are known to work with DLA. Other networks may work, but they have not been extensively tested.

  • The static library libnvparsers_static.a requires a special build of protobuf to complete static linking. Due to filename conflicts with the official protobuf packages, these additional libraries are only included in the tar file at this time. The two additional libraries that you will need to link against are libprotobuf.a and libprotobuf-lite.a from the tar file.

  • The ONNX static libraries libnvonnxparser_static.a and libnvonnxparser_runtime_static.a require static libraries that are missing from the package in order to complete static linking. The two static libraries that are required to complete linking are libonnx_proto.a and libnvonnxparser_plugin.a, as well as the protobuf libraries mentioned earlier. You will need to build these two missing static libraries from the open source ONNX project. This issue will be resolved in a future release.

  • If you upgrade only uff-converter-tf, for example using apt-get install uff-converter-tf, then it will not upgrade graphsurgeon-tf due to inexact dependencies between these two packages. You will need to specify both packages on the command line, such as apt-get install uff-converter-tf graphsurgeon-tf in order to upgrade both packages. This will be fixed in a future release.

  • The fc_plugin_caffe_mnist python sample cannot be executed if the sample is built using pybind11 v2.2.4. We suggest that you instead clone pybind11 v2.2.3 using the following command:
    git clone -b v2.2.3 https://github.com/pybind/pybind11.git

  • The C++ API documentation is not included in the TensorRT zip file. Refer to the online documentation if you want to view the TensorRT C++ API.

  • Most README files that are included with the samples assume that you are working on a Linux workstation. If you are using Windows and do not have access to a Linux system with an NVIDIA GPU, then you can try using VirtualBox to create a virtual machine based on Ubuntu. Many samples do not require any training, therefore the CPU versions of TensorFlow and PyTorch are enough to complete the samples.

  • The TensorRT Developer Guide has been written with Linux users in mind. Windows specific instructions, where possible, will be added in a future revision of the document.

TensorRT Release 5.0.0 Release Candidate (RC)

This is the release candidate (RC) for TensorRT 5.0.0. It includes several enhancements and improvements compared to the previously released TensorRT 4.0.1. This preview release is for early testing and feedback, therefore, for production use of TensorRT, continue to use TensorRT 4.0.1.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

Platforms
Added support for CentOS 7.5 and Ubuntu 18.04.

Turing
You must use CUDA 10.0 or later if you are using a Turing GPU.

DLA (Deep Learning Accelerator)
The layers supported by DLA are Activation, Concatenation, Convolution, Deconvolution, ElementWise, FullyConnected, LRN, Pooling, and Scale. For layer specific constraints, see DLA Supported Layers. Networks such as AlexNet, GoogleNet, ResNet-50, and MNIST work with DLA. Other CNN networks may work, but they have not been extensively tested and may result in failures including segfaults.
The trtexec tool can be used to run on DLA with the --useDLA=N and --fp16 options. To run the AlexNet network on DLA using trtexec, issue:
 ./trtexec --deploy=data/AlexNet/AlexNet_N2.prototxt --output=prob --useDLA=1 --fp16 --allowGPUFallback

trtexec does not support ONNX models to run on DLA.

Redesigned Python API
The Python API has been rewritten from scratch and includes various improvements. In addition to several bug fixes, it is now possible to serialize and deserialize an engine to and from a file using the Python API. Python samples using the new API include parser samples for ResNet-50, a Network API sample for MNIST, a plugin sample using Caffe, and an end-to-end sample using TensorFlow.

INT8
Support for user-defined INT8 scales, using the new ITensor::setDynamicRange function. This makes it possible to provide custom INT8 calibration without the need for a calibration data set. setDynamicRange currently supports only symmetric quantization. Furthermore, if no calibration table is provided, calibration scales must be provided for each layer.

Plugin Registry
A new searchable plugin registry, IPluginRegistry, that is a single registration point for all plugins in an application and is used to find plugin implementations during deserialization.

See the TensorRT Developer Guide for details.

Breaking API Changes

  • The IPluginExt API has 4 new methods, getPluginType, getPluginVersion, destroy and clone. All plugins of type IPluginExt will have to implement these new methods and re-compile. This is a temporary issue; we expect to restore compatibility with the 4.0 API in the GA release. For more information, see Migrating Plugins From TensorRT 4.0.x To TensorRT 5.0 RC for guidance on migration.

  • Upcoming changes in TensorRT 5.0 GA for plugins
    • A new plugin class IPluginV2 and a corresponding IPluginV2 layer will be introduced. The IPluginV2 class includes similar methods to IPlugin and IPluginExt, so if your plugin implemented IPluginExt previously, you will change the class name to IPluginV2.

    • The IPluginCreator class will create and deserialize plugins of type IPluginV2 as opposed to IPluginExt.

    • The create*Plugin() methods in NvInferPlugin.h will return plugin objects of type IPluginV2 as opposed to IPluginExt.

Compatibility

  • TensorRT 5.0.0 RC has been tested with cuDNN 7.3.0.

  • TensorRT 5.0.0 RC has been tested with TensorFlow 1.9.

  • This TensorRT release supports CUDA 10.0 and CUDA 9.0. CUDA 8.0 and CUDA 9.2 are no longer supported.

Limitations In 5.0.0 RC

  • For this release, there are separate JetPack L4T and Drive D5L packages due to differences in the DLA library dependencies. In a future release, this should become unified.

  • Android is not supported in TensorRT 5.0.0 RC.

  • The Python API does not support DLA.

  • The create*Plugin functions in the NvInferPlugin.h file do not have Python bindings.

  • The choice of which DLA device to run on is currently made at build time. In GA, it will be selectable at runtime.

  • ONNX models are not supported on DLA in TensorRT 5.0 RC.

  • The included resnet_v1_152, resnet_v1_50, lenet5, and vgg19 UFF files do not support FP16 mode. This is because some of the weights fall outside the range of FP16.

Deprecated Features

The following features are deprecated in TensorRT 5.0.0:
  • Majority of the old Python API, including the Lite and Utils API, is deprecated. It is currently still accessible in the tensorrt.legacy package, but will be removed in a future release.

  • The following Python examples:
    • caffe_to_trt
    • pytorch_to_trt
    • tf_to_trt
    • onnx_mnist
    • uff_mnist
    • mnist_api
    • sample_onnx
    • googlenet
    • custom_layers
    • lite_examples
    • resnet_as_a_service

  • The detectionOutput Plugin has been renamed to the NMS Plugin.

  • The old ONNX parser will no longer be packaged with TensorRT; instead, use the open-source ONNX parser.

  • The DimensionTypes class.

  • The plugin APIs that return IPlugin are being deprecated and they now return IPluginExt. These APIs will be removed in a future release. Refer to the NvInferPlugin.h file inside the package.

  • nvinfer1::IPluginFactory, nvuffparser1::IPluginFactory, and nvuffparser1::IPluginFactoryExt (still available for backward compatibility). Instead, use the Plugin Registry and implement IPluginCreator for all new plugins.

  • libnvinfer.a, libnvinfer_plugin.a, and libnvparsers.a have been renamed to libnvinfer_static.a, libnvinfer_plugin_static.a, and libnvparsers_static.a respectively. This makes TensorRT consistent with CUDA, cuDNN, and other NVIDIA software libraries. It also avoids some ambiguity between dynamic and static libraries during linking.

Known Issues

  • The Plugin Registry will only register plugins with a unique {name, version} tuple. The API for this is likely to change in future versions to support multiple plugins with same name and version.

  • Only AlexNet, GoogleNet, ResNet-50, and MNIST are known to work with DLA. Other networks may work, but they have not been extensively tested.

  • The static library libnvparsers_static.a requires a special build of protobuf to complete static linking. Due to filename conflicts with the official protobuf packages, these additional libraries are only included in the tar file at this time. The two additional libraries that you will need to link against are libprotobuf.a and libprotobuf-lite.a from the tar file.

  • The ONNX static libraries libnvonnxparser_static.a and libnvonnxparser_runtime_static.a require static libraries that are missing from the package in order to complete static linking. The two static libraries that are required to complete linking are libonnx_proto.a and libnvonnxparser_plugin.a, as well as the protobuf libraries mentioned earlier. You will need to build these two missing static libraries from the open source ONNX project. This issue will be resolved in a future release.

  • If you upgrade only uff-converter-tf, for example using apt-get install uff-converter-tf, then it will not upgrade graphsurgeon-tf due to inexact dependencies between these two packages. You will need to specify both packages on the command line, such as apt-get install uff-converter-tf graphsurgeon-tf in order to upgrade both packages. This will be fixed in a future release.

  • The fc_plugin_caffe_mnist python sample cannot be executed if the sample is built using pybind11 v2.2.4. We suggest that you instead clone pybind11 v2.2.3 using the following command:
    git clone -b v2.2.3 https://github.com/pybind/pybind11.git