cuDNN Release Notes v7.1.2
cuDNN Release Notes v7.1.2 (PDF)
Key Features and Enhancements
The following enhancements have been added to this release:
- RNN search API extended to support all RNN algorithms.
- Newly added projection Layer supported for inference bidirectional RNN cells and for backward data and gradient.
- Support IDENTITY Activation for all
cudnnConvolutionBiasActivationForwarddata types for
- Added documentation to clarify RNN/LSTM weight formats.
Following are known issues in this release:
- cudnnGet picks a slow algorithm that does not use Tensor Cores on Volta when inputs are FP16 and it is possible to do so.
- There may be a small performance regression on multi-layer RNNs using the STANDARD algorithm with Tensor Core math in this release compared to v7.0.5.
- LSTM projection dgrad half precision may fail in rare cases with misaligned memory access on Pascal and Maxwell.
- Dgrad for bidirectional LSTM with projection should not be used, may produce inaccurate results, or
cudnnConvolutionBackwardFilter()function may output incorrect results for
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILINGwhen the convolution mode is
CUDNN_CONVOLUTIONand the product "n*k" (n - batch size, k - number of output feature maps) is large, i.e., several thousand or more. It appears that the
CUDNN_CROSS_CORRELATIONmode is not affected by this.
- Persistent LSTM backward pass with a hidden state size in the range 257 to 512 on GPUs with number of SMs between 22 and 31 might hang. This issue also exists in 7.1.1 and will be fixed in 7.1.3.
- Persistent GRU backward pass with a hidden state size in the range 513 to 720 on GPUs with exactly 30 SMs would hang. This issue also exists in 7.1.1 and will be fixed in 7.1.3.
- Algo 1 for forward convolution and dgrad may produce erroneous results when the filter size is greater than the input size.
The following issues have been fixed in this release:
- The uint8 input for convolution is restricted to Volta and later. We added support for older architectures, for algo:
- In some cases when algorithm
CUDNN_CONVOLUTION_BWD_FILTER_ALGO1was selected, the routine
cudnnConvolutionBackwardFiltercould fail at runtime and return
CUDNN_STATUS_EXECUTION_FAILED. It now returns
cudnnSetRNNDescriptorno longer needs valid Dropout Descriptor in inference mode, user can pass NULL for Dropout Descriptor in inference mode.