cuDNN Release Notes v7.1.2

Key Features and Enhancements

The following enhancements have been added to this release:

  • RNN search API extended to support all RNN algorithms.
  • Newly added projection Layer supported for inference bidirectional RNN cells and for backward data and gradient.
  • Support IDENTITY Activation for all cudnnConvolutionBiasActivationForward data types for CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM.
  • Added documentation to clarify RNN/LSTM weight formats.

Known Issues

Following are known issues in this release:

  • cudnnGet picks a slow algorithm that does not use Tensor Cores on Volta when inputs are FP16 and it is possible to do so.
  • There may be a small performance regression on multi-layer RNNs using the STANDARD algorithm with Tensor Core math in this release compared to v7.0.5.
  • LSTM projection dgrad half precision may fail in rare cases with misaligned memory access on Pascal and Maxwell.
  • Dgrad for bidirectional LSTM with projection should not be used, may produce inaccurate results, or CUDNN_STATUS_UNSUPPORTED.
  • The cudnnConvolutionBackwardFilter() function may output incorrect results for CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING when the convolution mode is CUDNN_CONVOLUTION and the product "n*k" (n - batch size, k - number of output feature maps) is large, i.e., several thousand or more. It appears that the CUDNN_CROSS_CORRELATION mode is not affected by this.
  • Persistent LSTM backward pass with a hidden state size in the range 257 to 512 on GPUs with number of SMs between 22 and 31 might hang. This issue also exists in 7.1.1 and will be fixed in 7.1.3.
  • Persistent GRU backward pass with a hidden state size in the range 513 to 720 on GPUs with exactly 30 SMs would hang. This issue also exists in 7.1.1 and will be fixed in 7.1.3.
  • Algo 1 for forward convolution and dgrad may produce erroneous results when the filter size is greater than the input size.

Fixed Issues

The following issues have been fixed in this release:

  • The uint8 input for convolution is restricted to Volta and later. We added support for older architectures, for algo: CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM.
  • In some cases when algorithm CUDNN_CONVOLUTION_BWD_FILTER_ALGO1 was selected, the routine cudnnConvolutionBackwardFilter could fail at runtime and return CUDNN_STATUS_EXECUTION_FAILED. It now returns CUDNN_STATUS_NOT_SUPPORTED.
  • cudnnSetRNNDescriptor no longer needs valid Dropout Descriptor in inference mode, user can pass NULL for Dropout Descriptor in inference mode.