Key Features and Enhancements
The following enhancements have been added to this release:
- Improved performance for some cases of data-gradient convolutions and maxpooling. This is expected to improve performance of ResNet-50 like networks.
- The runtime of the RNN Find algorithm suite is improved in v7.1.4 resulting in slightly improved runtime of
cudnnFindRNN***AlgorithmEx
.
Known Issues
Following are known issues in this release:
cudnnGet
picks a slow algorithm that does not use Tensor Cores on Volta when inputs are FP16 and it is possible to do so.- The
cudnnConvolutionBackwardFilter()
function may output incorrect results forCUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING
when the convolution mode isCUDNN_CONVOLUTION
. This function should not be used in this mode.
Fixed Issues
The following issues have been fixed in this release:
cudnnAddTensorNd
might cause a segmentation fault if called with bad arguments (e.g. null pointer), this issue is in 7.1.3 only and fixed in 7.1.4.cudnnRNNBackwardData
LSTM cell with fp16 (half) inputs might generate wrong values (silently), this issue exists in cudnn 7.1.3 binaries compiled with cuda toolkit 9.0 and toolkit cuda 9.2, and does not exist in cudnn 7.1.3 binaries compiled with toolkit 9.1.cudnnGetRNNLinLayerMatrixParams
wrongly returns CUDNN_STATUS_BAD_PARAM whencudnnSetRNNDescriptor
is called with dataType == CUDNN_DATA_FLOAT. This is an issue in 7.1.3 only and will be fixed in 7.1.4. The dataType argument as of today supports onlyCUDNN_DATA_FLOAT
and we plan to support additional compute types in the future.- There is a small memory leak issue when calling
cudnnRNNBackwardData
withCUDNN_RNN_ALGO_STANDARD
. This issue also affects previous cuDNN v7 releases. This is fixed in 7.1.4. - RNN with half precision returns
CUDNN_EXECUTION_FAILED
on Kepler gpu in 7.1.3. This is fixed in 7.1.4 to use pseudo-fp16 computation - The RNN Find algorithm suite mistakenly did not test
CUDNN_RNN_ALGO_PERSIST_STATIC
andCUDNN_RNN_ALGO_PERSIST_DYNAMIC
kernels with tensor operations enabled when it was possible to do so. This is fixed in v7.1.4.