Key Features and Enhancements
The following enhancements have been added to this release:
- RNN search API extended to support all RNN algorithms.
- Newly added projection Layer supported for inference bidirectional RNN cells and for backward data and gradient.
- Support IDENTITY Activation for all
cudnnConvolutionBiasActivationForward
data types forCUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
. - Added documentation to clarify RNN/LSTM weight formats.
Known Issues
Following are known issues in this release:
- cudnnGet picks a slow algorithm that does not use Tensor Cores on Volta when inputs are FP16 and it is possible to do so.
- There may be a small performance regression on multi-layer RNNs using the STANDARD algorithm with Tensor Core math in this release compared to v7.0.5.
- LSTM projection dgrad half precision may fail in rare cases with misaligned memory access on Pascal and Maxwell.
- Dgrad for bidirectional LSTM with projection should not be used, may produce inaccurate results, or
CUDNN_STATUS_UNSUPPORTED
. - The
cudnnConvolutionBackwardFilter()
function may output incorrect results forCUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING
when the convolution mode isCUDNN_CONVOLUTION
and the product "n*k" (n - batch size, k - number of output feature maps) is large, i.e., several thousand or more. It appears that theCUDNN_CROSS_CORRELATION
mode is not affected by this. - Persistent LSTM backward pass with a hidden state size in the range 257 to 512 on GPUs with number of SMs between 22 and 31 might hang. This issue also exists in 7.1.1 and will be fixed in 7.1.3.
- Persistent GRU backward pass with a hidden state size in the range 513 to 720 on GPUs with exactly 30 SMs would hang. This issue also exists in 7.1.1 and will be fixed in 7.1.3.
- Algo 1 for forward convolution and dgrad may produce erroneous results when the filter size is greater than the input size.
Fixed Issues
The following issues have been fixed in this release:
- The uint8 input for convolution is restricted to Volta and later. We added support for older architectures, for algo:
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
. - In some cases when algorithm
CUDNN_CONVOLUTION_BWD_FILTER_ALGO1
was selected, the routinecudnnConvolutionBackwardFilter
could fail at runtime and returnCUDNN_STATUS_EXECUTION_FAILED
. It now returnsCUDNN_STATUS_NOT_SUPPORTED
. cudnnSetRNNDescriptor
no longer needs valid Dropout Descriptor in inference mode, user can pass NULL for Dropout Descriptor in inference mode.