cuDNN Release 7.4.1

This is the cuDNN 7.4.1 release notes. This release includes fixes from the previous cuDNN v7.x.x releases as well as the following additional changes.

Key Features and Enhancements

The following enhancements have been added to this release:

  • Added a new family of fast NHWC batch normalization functions. See the following five new functions and one new type descriptor:

    • cudnnGetBatchNormalizationForwardTrainingExWorkspaceSize() function

    • cudnnBatchNormalizationForwardTrainingEx function

    • cudnnGetBatchNormalizationBackwardExWorkspaceSize() function

    • cudnnBatchNormalizationBackwardEx() function

    • cudnnGetBatchNormalizationTrainingExReserveSpaceSize() function

    • cudnnBatchNormOps_t type descriptor

  • For API Logging, a conversion specifier for the process id is added. With this, the process id can be included in the log file name. See API Logging.
  • Performance of cudnnPoolingBackward() is enhanced for the average pooling when using NHWC data format--for both the CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING and CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING cases of cudnnPoolingMode_t.
  • Performance of the strided convolution in cudnnConvolutionBackwardData() is enhanced when the filter is in NHWC format and the data type is TRUE_HALF_CONFIG or PSEUDO_HALF_CONFIG or FLOAT_CONFIG. For strides u,v < r,s the performance is further enhanced.
  • Significantly improved the performance of cudnnConvolutionForward(), cudnnConvolutionBackwardData() and cudnnConvolutionBackwardFilter() functions on RCNN models such as Fast RCNN, Faster RCNN, & Mask RCNN.

Fixed Issues

The following issues have been fixed in this release:

  • The following set up was giving “Misaligned Address” error in cuDNN 7.3.x. This is fixed in cuDNN 7.4.1: For the cudnnConvolutionForward() function with the CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM algorithm, in the data type configuration of PSEUDO_HALF_CONFIG, when the input and output tensors are in in NHWC and the filter is 1x1 and NCHW, and Tensor Op is enabled.
  • For a few convolution sizes for ALGO_0 and ALGO_1, the performance of the function cudnnConvolutionBackwardFilter() was degraded in cuDNN 7.3.1. This is now fixed.
  • Fixed. In cuDNN 7.3.1 the function cudnnAddTensor was computing incorrect results when run on GPUs with the compute capability < 6.0 (prior to Pascal).

Known Issues

The following issues and limitations exist in this release:

  • When calling the cudnnConvolutionBiasActivationForward() function with the algo parameter set to CUDNN_CONVOLUTION_FWD_ALGO_FFT and the activationDesc parameter set to CUDNN_ACTIVATION_RELU and sufficiently large inputs, the ReLU operation is not applied and negative values are passed through to the output. This issue is present in all previous cuDNN versions.