cuDNN Release 7.1.4

This is the cuDNN 7.1.4 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.

Key Features and Enhancements

The following enhancements have been added to this release:

  • Improved performance for some cases of data-gradient convolutions and maxpooling. This is expected to improve performance of ResNet-50 like networks.
  • The runtime of the RNN Find algorithm suite is improved in v7.1.4 resulting in slightly improved runtime of cudnnFindRNN***AlgorithmEx.

Known Issues

Following are known issues in this release:

  • cudnnGet picks a slow algorithm that does not use Tensor Cores on Volta when inputs are FP16 and it is possible to do so.
  • The cudnnConvolutionBackwardFilter() function may output incorrect results for CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT_TILING when the convolution mode is CUDNN_CONVOLUTION. This function should not be used in this mode.

Fixed Issues

The following issues have been fixed in this release:

  • cudnnAddTensorNd might cause a segmentation fault if called with bad arguments (e.g. null pointer), this issue is in 7.1.3 only and fixed in 7.1.4.
  • cudnnRNNBackwardData LSTM cell with fp16 (half) inputs might generate wrong values (silently), this issue exists in cudnn 7.1.3 binaries compiled with cuda toolkit 9.0 and toolkit cuda 9.2, and does not exist in cudnn 7.1.3 binaries compiled with toolkit 9.1.
  • cudnnGetRNNLinLayerMatrixParams wrongly returns CUDNN_STATUS_BAD_PARAM when cudnnSetRNNDescriptor is called with dataType == CUDNN_DATA_FLOAT. This is an issue in 7.1.3 only and will be fixed in 7.1.4. The dataType argument as of today supports only CUDNN_DATA_FLOAT and we plan to support additional compute types in the future.
  • There is a small memory leak issue when calling cudnnRNNBackwardData with CUDNN_RNN_ALGO_STANDARD. This issue also affects previous cuDNN v7 releases. This is fixed in 7.1.4.
  • RNN with half precision returns CUDNN_EXECUTION_FAILED on Kepler gpu in 7.1.3. This is fixed in 7.1.4 to use pseudo-fp16 computation
  • The RNN Find algorithm suite mistakenly did not test CUDNN_RNN_ALGO_PERSIST_STATIC and CUDNN_RNN_ALGO_PERSIST_DYNAMIC kernels with tensor operations enabled when it was possible to do so. This is fixed in v7.1.4.