cuDNN Release Notes v7.0.5
cuDNN Release Notes v7.0.5 (PDF)
Key Features and Enhancements
The following enhancements have been added to this release:
Following are known issues in this release:
- cuDNN library may trigger a CPU floating point exception when FP exceptions are enabled by user. This issue exists for all 7.0.x releases.
- There are heavy use cases of RNN layers that might hit a memory allocation issue in the CUDA driver when using cuDNN v7 with CUDA 8.0 and R375 driver on pre-Pascal architectures (Kepler and Maxwell). In these cases, subsequent CUDA kernels may fail to launch with an Error Code 30. To resolve the issue, it is recommended to use the latest R384 driver (from NVIDIA driver downloads) or to ensure that the persistence daemon is started. This behavior is observed on all 7.0.x releases.
- When using TENSOR_OP_MATH mode with
cudnnConvolutionBiasActivationForward, the pointer to the bias must be aligned to 16 bytes and the size of allocated memory must be multiples of 256 elements. This behavior exists for all 7.0.x releases.
The following issues have been fixed in this release:
- Corrected the algorithm fallback behavior in RNN when user set to use
CUDNN_TENSOR_OP_MATHwhen using compute card without HMMA. Instead of returning
CUDNN_STATUS_NOT_SUPPORTED, the RNN algorithm will now continue to run using
CUDNN_DEFAULT_MATH. The correct behavior is to fall back to using default math when Tensor Core is not supported. Fixed to the expected behavior.
- On Volta hardware,
BWD_DATA_ALGO_1convolutions using a number of filter elements greater than 512 were causing
CUDNN_STATUS_INTERNAL_ERRORerrors. Logic was added to fall back to a generic kernel for these filter sizes.
- cuDNN v7 with CUDA 8.0 produced erroneous results on Volta for some common cases of Algo 1. Logic was added to fall back to a generic kernel when cudnn v7 with CUDA 8.0 is used on Volta.