TensorRT Release 3.0.2

This TensorRT 3.0.2 General Availability release is a minor release and includes some improvements and fixes compared to the previously released TensorRT 3.0.1.

Key Features and Enhancements

This TensorRT release includes the following key features and enhancements.

  • Fixed a bug in one of the INT8 deconvolution kernels that was generating incorrect results. This fixed accuracy regression from 2.1 for networks that use deconvolutions.
  • Fixed a bug where the builder would report out-of-memory when compiling a low precision network, in the case that a low-precision version of the kernel could not be found. The builder now correctly falls back to a higher precision version of the kernel.
  • Fixed a bug where the existence of some low-precision kernels were being incorrectly reported to the builder.

Using TensorRT 3.0.2

Ensure you are familiar with the following notes when using this release.
  • When working with large networks and large batch sizes on the Jetson TX1 you may see failures that are the result of CUDA error 4. This error generally means a CUDA kernel failed to execute properly, but sometimes this can mean the CUDA kernel actually timed out. The CPU and GPU share memory on the Jetson TX1 and reducing the memory used by the CPU would help the situation. If you are not using the graphical display on L4T you can stop the X11 server to free up CPU and GPU memory. This can be done using:
    $ sudo systemctl stop lightdm.service

Known Issues

  • INT8 deconvolutions with biases have the bias scaled incorrectly. U-Net based segmentation networks typically have non-zero bias.
  • For TensorRT Android 32-bit, if your memory usage is high, then you may see TensorRT failures. The issue is related to the CUDA allocated buffer address being higher or equal to 0x80000000 and it is hard to know the exact memory usage after which this issue is hit.
  • If you are installing TensorRT from a tar package (instead of using the .deb packages and apt-get), you will need to update the custom_plugins example to point to the location that the tar package was installed into. For example, in the <PYTHON_INSTALL_PATH>/tensorrt/examples/custom_layers/tensorrtplugins/setup.py file change the following:
    • Change TENSORRT_INC_DIR to point to the <TAR_INSTALL_ROOT>/include directory.
    • Change TENSORRT_LIB_DIR to point to <TAR_INSTALL_ROOT>/lib directory.
  • If you were previously using the machine learning debian repository, then it will conflict with the version of libcudnn7 that is contained within the local repository for TensorRT. The following commands will downgrad libcudnn7 to the CUDA 9.0 version, which is supported by TensorRT, and hold the package at this version.
    sudo apt-get install libcudnn7=7.0.5.15-1+cuda9.0 
    libcudnn7-dev=7.0.5.15-1+cuda9.0
    sudo apt-mark hold libcudnn7 libcudnn7-dev
    
    If you would like to later upgrade libcudnn7 to the latest version, then you can use the following commands to remove the hold.
    sudo apt-mark unhold libcudnn7 libcudnn7-dev
    sudo apt-get dist-upgrade