cuDNN Release Notes v7.0.4

Key Features and Enhancements

Performance improvements for grouped convolutions when input channels and output channels per group are 1, 2, or 4 for the following algorithms:

  • CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
  • CUDNN_CONVOLUTION_BWD_DATA_ALGO0
  • CUDNN_CONVOLUTION_BWD_DATA_ALGO_1
  • CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0
  • CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1

Known Issues

Following are known issues in this release:

  • The CUDA 8.0 build of cuDNN may produce incorrect computations when run on Volta.
  • cuDNN library triggers CPU floating point exception when FP exceptions are enabled by user. This issue exists for all 7.0.x releases.
  • There are heavy use cases of RNN layers that might hit a memory allocation issue in the CUDA driver when using cuDNN v7 with CUDA 8.0 and R375 driver on pre-Pascal architectures (Kepler and Maxwell). In these cases, subsequent CUDA kernels may fail to launch with an Error Code 30. To resolve the issue, it is recommended to use the latest R384 driver (from NVIDIA driver downloads) or to ensure that the persistence daemon is started. This behavior is observed on all 7.0.x releases.
  • When using TENSOR_OP_MATH mode with cudnnConvolutionBiasActivationForward, the pointer to the bias must be aligned to 16 bytes and the size of allocated memory must be multiples of 256 elements. This behavior exists for all 7.0.x releases.

Fixed Issues

The following issues have been fixed in this release:

  • Fixed out-of-band global memory accesses in the 256-point 1D FFT kernel. The problem affected convolutions with 1x1 filters and tall but narrow images, e.g., 1x500 (WxH). In those cases, the workspace size for the FFT_TILING algo was computed incorrectly. There was no error in the FFT kernel.
  • Eliminated a source of floating point exceptions in the CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED algorithm. The host code to generate a negative infinity floating point value was substituted with a different logic. By default, FP exceptions are disabled. However, a user program enabled them by invoking feenableexcept(). There are at least two other sources of FP exceptions in the cuDNN library, affecting for example BATCHNORM_SPATIAL_PERSISTENT. Those sources of FP exceptions will be eliminated in future releases of the cuDNN library.