cuDNN Release Notes :: Deep Learning SDK Documentation

Key Features and Enhancements

The following enhancements have been added to this release:

Following are known issues in this release:

cuDNN library may trigger a CPU floating point exception when FP exceptions are enabled by user. This issue exists for all 7.0.x releases.
There are heavy use cases of RNN layers that might hit a memory allocation issue in the CUDA driver when using cuDNN v7 with CUDA 8.0 and R375 driver on pre-Pascal architectures (Kepler and Maxwell). In these cases, subsequent CUDA kernels may fail to launch with an Error Code 30. To resolve the issue, it is recommended to use the latest R384 driver (from NVIDIA driver downloads) or to ensure that the persistence daemon is started. This behavior is observed on all 7.0.x releases.
When using TENSOR_OP_MATH mode with cudnnConvolutionBiasActivationForward, the pointer to the bias must be aligned to 16 bytes and the size of allocated memory must be multiples of 256 elements. This behavior exists for all 7.0.x releases.

The following issues have been fixed in this release:

Corrected the algorithm fallback behavior in RNN when user set to use CUDNN_TENSOR_OP_MATH when using compute card without Tensor Cores. Instead of returning CUDNN_STATUS_NOT_SUPPORTED, the RNN algorithm will now continue to run using CUDNN_DEFAULT_MATH. The correct behavior is to fall back to using default math when Tensor Core is not supported. Fixed to the expected behavior.
On Volta hardware, BWD_FILTER_ALGO_1 and BWD_DATA_ALGO_1 convolutions using a number of filter elements greater than 512 were causing CUDA_ERROR_ILLEGAL_ADDRESS and CUDNN_STATUS_INTERNAL_ERROR errors. Logic was added to fall back to a generic kernel for these filter sizes.
cuDNN v7 with CUDA 8.0 produced erroneous results on Volta for some common cases of Algo 1. Logic was added to fall back to a generic kernel when cudnn v7 with CUDA 8.0 is used on Volta.