cuDNN Release Notes v7.1.3
cuDNN Release Notes v7.1.3 (PDF)
Following are known issues in this release:
cudnnGetpicks a slow algorithm that does not use Tensor Cores on Volta when inputs are FP16 and it is possible to do so.
cudnnConvolutionBackwardFilter()function may output incorrect results for
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILINGwhen the convolution mode is CUDNN_CONVOLUTION and the product "n*k" (n - batch size, k - number of output feature maps) is large, i.e., several thousand or more. It appears that the CUDNN_CROSS_CORRELATION mode is not affected by this bug.
- There is a small memory leak issue when calling
CUDNN_RNN_ALGO_STANDARD. This issue also affects previous cuDNN v7 releases.
- RNN with half precision will not work on Kepler GPUs and will return
CUDNN_EXECUTION_FAILED. This will be fixed in future releases to return
The following issues have been fixed in this release:
cudnnRNNbackwardDatafor LSTM with recurrent projection in half precision may fail in rare cases with misaligned memory access on Pascal and Maxwell.
cudnnRNNbackwardDatafor bidirectional LSTM with recurrent projection may produce inaccurate results, or
- Algo 1 for forward convolution and dgrad may produce erroneous results when the filter size is greater than the input size. This issue is fixed in 7.1.3.
- For very large RNN networks, the function
cudnnGetRNNTrainingReserveSizemay internally overflow and give incorrect results.
- The small performance regression on multi-layer RNNs using the STANDARD algorithm and Tensor Core math in 7.1.2, as compared to 7.0.5, is fixed in this release.
- Fixed an issue with Persistent LSTM backward pass with a hidden state size in the range 257 to 512 on GPUs with number of SMs between 22 and 31 might hang. This issue also exists in 7.1.1. This is fixed in 7.1.3.
- Fixed an issue Persistent GRU backward pass with a hidden state size in the range 513->720 on GPUs with exactly 30 SMs would hang. This issue also exists in 7.1.1. This is fixed in 7.1.3.