cuDNN Release Notes v7.1.3

Known Issues

Following are known issues in this release:

  • cudnnGet picks a slow algorithm that does not use Tensor Cores on Volta when inputs are FP16 and it is possible to do so.
  • The cudnnConvolutionBackwardFilter() function may output incorrect results for CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING when the convolution mode is CUDNN_CONVOLUTION and the product "n*k" (n - batch size, k - number of output feature maps) is large, i.e., several thousand or more. It appears that the CUDNN_CROSS_CORRELATION mode is not affected by this bug.
  • There is a small memory leak issue when calling cudnnRNNBackwardData with CUDNN_RNN_ALGO_STANDARD. This issue also affects previous cuDNN v7 releases.
  • RNN with half precision will not work on Kepler GPUs and will return CUDNN_EXECUTION_FAILED. This will be fixed in future releases to return CUDNN_STATUS_UNSUPPORTED.

Fixed Issues

The following issues have been fixed in this release:

  • cudnnRNNbackwardData for LSTM with recurrent projection in half precision may fail in rare cases with misaligned memory access on Pascal and Maxwell.
  • cudnnRNNbackwardData for bidirectional LSTM with recurrent projection may produce inaccurate results, or CUDNN_STATUS_UNSUPPORTED.
  • Algo 1 for forward convolution and dgrad may produce erroneous results when the filter size is greater than the input size. This issue is fixed in 7.1.3.
  • For very large RNN networks, the function cudnnGetRNNWorkspaceSize and cudnnGetRNNTrainingReserveSize may internally overflow and give incorrect results.
  • The small performance regression on multi-layer RNNs using the STANDARD algorithm and Tensor Core math in 7.1.2, as compared to 7.0.5, is fixed in this release.
  • Fixed an issue with Persistent LSTM backward pass with a hidden state size in the range 257 to 512 on GPUs with number of SMs between 22 and 31 might hang. This issue also exists in 7.1.1. This is fixed in 7.1.3.
  • Fixed an issue Persistent GRU backward pass with a hidden state size in the range 513->720 on GPUs with exactly 30 SMs would hang. This issue also exists in 7.1.1. This is fixed in 7.1.3.