Key Features and Enhancements
The following enhancements have been added to this release:
-
Added a new family of fast NHWC batch normalization functions. See the following five new functions and one new type descriptor:
- For API Logging, a conversion specifier for the process id is added. With this, the process id can be included in the log file name. See API Logging.
- Performance of
cudnnPoolingBackward()
is enhanced for the average pooling when using NHWC data format--for both the CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING and CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING cases ofcudnnPoolingMode_t
. - Performance of the strided convolution in
cudnnConvolutionBackwardData()
is enhanced when the filter is in NHWC format and the data type is TRUE_HALF_CONFIG or PSEUDO_HALF_CONFIG or FLOAT_CONFIG. For stridesu,v < r,s
the performance is further enhanced. - Significantly improved the performance of cudnnConvolutionForward(), cudnnConvolutionBackwardData() & cudnnConvolutionBackwardFilter() functions on RCNN models such as Fast RCNN, Faster RCNN, & Mask RCNN.
Fixed Issues
The following issues have been fixed in this release:
- The following set up was giving “Misaligned Address” error in cuDNN 7.3.x. This is fixed in cuDNN 7.4.1: For the cudnnConvolutionForward() function with the CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM algorithm, in the data type configuration of PSEUDO_HALF_CONFIG, when the input and output tensors are in in NHWC and the filter is 1x1 and NCHW, and Tensor Op is enabled.
- For a few convolution sizes for ALGO_0 and ALGO_1, the performance of the function cudnnConvolutionBackwardFilter() was degraded in cuDNN 7.3.1. This is now fixed.
- Fixed. In cuDNN 7.3.1 the function cudnnAddTensor was computing incorrect results when run on GPUs with the compute capability < 6.0 (prior to Pascal).
Known Issues
The following issues and limitations exist in this release:
- When calling the
cudnnConvolutionBiasActivationForward()
function with thealgo
parameter set to CUDNN_CONVOLUTION_FWD_ALGO_FFT and theactivationDesc
parameter set to CUDNN_ACTIVATION_RELU and sufficiently large inputs, the ReLU operation is not applied and negative values are passed through to the output. This issue is present in all previous cuDNN versions.