cuDNN Release Notes :: Deep Learning SDK Documentation

Key Features and Enhancements

The following enhancements have been added to this release:

The FFT tiling algorithms for convolution have been enhanced to support strided convolution. In specific, for the algorithms CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING and CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILING, the convDesc's vertical and horizontal filter stride can be 2 when neither the filter width nor the filter height is 1.
The CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD algorithm for cudnnConvolutionForward() and cudnnConvolutionBackwardData() now give superior performance for Volta architecture. In addition, the mobile version of this algorithm in the same functions gives superior performance for Maxwell and Pascal architectures.
Dilated convolutions now give superior performance for cudnnConvolutionForward(), cudnnConvolutionBackwardData(), and cudnnConvolutionBackwardFilter() on Volta architecture, in some cases.

The following issues and limitations exist in this release:

For the cudnnConvolutionForward(), when using a 1x1 filter with input and output tensors of NHWC format and of CUDNN_DATA_HALF (half precision) type, and the filter format is NCHW, with compute type of float, cuDNN will generate incorrect results.
On Quadro P4000, when calling cudnnConvolutionForward() function with CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED algorithm, there may be a small chance of seeing intermittent inaccurate results.
When using cudnnConvolutionBackwardFilter() with CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 in mixed precision computation, with input/output in CUDNN_DATA_HALF (half precision) and compute type of float, when the number of batches (N) is larger than 1 the results might include INF due to an intermediate down-convert to half float. In other words, with an accumulation of float for all intermediate values (such as in CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1) the result will be a finite half precision float. This limitation also exists in all previous cuDNN versions.

The following issues have been fixed in this release:

Fixed a pointer arithmetic integer overflow issue in RNN forward and backward functions, when sequence length and mini-batch size are sufficiently large.
When tensor cores are enabled in cuDNN 7.3.0, the cudnnConvolutionBackwardFilter() calculations were performing an illegal memory access when K and C values are both non-integral multiples of 8. This issue is fixed.
For the CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 algorithm in cudnnConvolutionBackwardFilter(), on Volta, the tensor operations were occasionally failing when the filter spatial size (filter h * filter w) was greater than 64. This issue is fixed.
While running cuDNN 7.3.0 on Turing with CUDA 10.0, r400 driver, the functions cudnnRNNForwardTraining(Ex) and cudnnRNNForwardInference(Ex) errored out returning CUDNN_STATUS_NOT_SUPPORTED. This issue is fixed.
In cuDNN 7.3.0, when using CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 with tensor data or filter data in NHWC format, the function might have resulted in a silent failure. This is now fixed.