Key Features and Enhancements
The following enhancements have been added to this release:
- None.
Known Issues
Following are known issues in this release:
cudnnGet
picks a slow algorithm that does not use Tensor Cores on Volta when inputs are FP16 and it is possible to do so.- The
cudnnConvolutionBackwardFilter()
function may output incorrect results forCUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING
when the convolution mode is CUDNN_CONVOLUTION and the product "n*k" (n - batch size, k - number of output feature maps) is large, i.e., several thousand or more. It appears that the CUDNN_CROSS_CORRELATION mode is not affected by this bug. - There is a small memory leak issue when calling
cudnnRNNBackwardData
withCUDNN_RNN_ALGO_STANDARD
. This issue also affects previous cuDNN v7 releases. - RNN with half precision will not work on Kepler GPUs and will return
CUDNN_EXECUTION_FAILED
. This will be fixed in future releases to returnCUDNN_STATUS_UNSUPPORTED
.
Fixed Issues
The following issues have been fixed in this release:
cudnnRNNbackwardData
for LSTM with recurrent projection in half precision may fail in rare cases with misaligned memory access on Pascal and Maxwell.cudnnRNNbackwardData
for bidirectional LSTM with recurrent projection may produce inaccurate results, orCUDNN_STATUS_UNSUPPORTED
.- Algo 1 for forward convolution and dgrad may produce erroneous results when the filter size is greater than the input size. This issue is fixed in 7.1.3.
- For very large RNN networks, the function
cudnnGetRNNWorkspaceSize
andcudnnGetRNNTrainingReserveSize
may internally overflow and give incorrect results. - The small performance regression on multi-layer RNNs using the STANDARD algorithm and Tensor Core math in 7.1.2, as compared to 7.0.5, is fixed in this release.
- Fixed an issue with Persistent LSTM backward pass with a hidden state size in the range 257 to 512 on GPUs with number of SMs between 22 and 31 might hang. This issue also exists in 7.1.1. This is fixed in 7.1.3.
- Fixed an issue Persistent GRU backward pass with a hidden state size in the range 513->720 on GPUs with exactly 30 SMs would hang. This issue also exists in 7.1.1. This is fixed in 7.1.3.