Key Features and Enhancements
The following enhancements have been added to this release:
- The FFT tiling algorithms for convolution have been enhanced to support strided convolution. In specific, for the algorithms CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING and CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILING, the
convDesc
's vertical and horizontal filter stride can be 2 when neither the filter width nor the filter height is 1. - The CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD algorithm for
cudnnConvolutionForward()
andcudnnConvolutionBackwardData()
now give superior performance for Volta architecture. In addition, the mobile version of this algorithm in the same functions gives superior performance for Maxwell and Pascal architectures. - Dilated convolutions now give superior performance for
cudnnConvolutionForward()
,cudnnConvolutionBackwardData()
, andcudnnConvolutionBackwardFilter()
on Volta architecture, in some cases.
Known Issues and Limitations
The following issues and limitations exist in this release:
- For the
cudnnConvolutionForward()
, when using a 1x1 filter with input and output tensors ofNHWC
format and of CUDNN_DATA_HALF (half precision) type, and the filter format isNCHW
, with compute type of float, cuDNN will generate incorrect results. - On Quadro P4000, when calling
cudnnConvolutionForward()
function with CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED algorithm, there may be a small chance of seeing intermittent inaccurate results. - When using
cudnnConvolutionBackwardFilter()
with CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 in mixed precision computation, with input/output in CUDNN_DATA_HALF (half precision) and compute type of float, when the number of batches (N) is larger than 1 the results might include INF due to an intermediate down-convert to half float. In other words, with an accumulation of float for all intermediate values (such as in CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1) the result will be a finite half precision float. This limitation also exists in all previous cuDNN versions.
Fixed Issues
The following issues have been fixed in this release:
- Fixed a pointer arithmetic integer overflow issue in RNN forward and backward functions, when sequence length and mini-batch size are sufficiently large.
- When tensor cores are enabled in cuDNN 7.3.0, the
cudnnConvolutionBackwardFilter()
calculations were performing an illegal memory access when K and C values are both non-integral multiples of 8. This issue is fixed. - For the CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 algorithm in
cudnnConvolutionBackwardFilter()
, on Volta, the tensor operations were occasionally failing when the filter spatial size (filterh
* filterw
) was greater than 64. This issue is fixed. - While running cuDNN 7.3.0 on Turing with CUDA 10.0, r400 driver, the functions
cudnnRNNForwardTraining(Ex)
andcudnnRNNForwardInference(Ex)
errored out returning CUDNN_STATUS_NOT_SUPPORTED. This issue is fixed. - In cuDNN 7.3.0, when using CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 with tensor data or filter data in
NHWC
format, the function might have resulted in a silent failure. This is now fixed.