cuDNN Release Notes v7.5.0

Key Features and Enhancements

The following features and enhancements have been added to this release:

Fixed Issues

The following issues have been fixed in this release:

  • When the following is true for the cudnnConvolutionBackwardData() function:
    • used with CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILING, and
    • convDesc's vertical stride is exactly 2, and
    • the vertical padding is a multiple of 2, and
    • the filter height is a multiple of 2
    OR
    • used with CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT_TILING, and
    • convDesc's horizontal stride is exactly 2, and
    • the horizontal padding is a multiple of 2, and
    • the filter width is a multiple of 2

    then the resulting output is incorrect. This issue was present in cuDNN 7.3.1 and later. This is fixed in cuDNN 7.5.0.

  • The mathPrec parameter in cudnnSetRNNDescriptor is reserved for controlling math precision in RNN, but was not checked or enforced. This parameter is now strictly enforced. As a result, the following applies:
    • For the input/output in FP16, the parameter mathPrec can be CUDNN_DATA_HALF or CUDNN_DATA_FLOAT.
    • For the input/output in FP32, the parameter mathPrec can only be CUDNN_DATA_FLOAT, and
    • For the input/output in FP64, double type, the parameter mathPrec can only be CUDNN_DATA_DOUBLE.
  • Users upgrading to cuDNN 7.4 may see insufficiently small values returned from the function cudnnGetConvolutionBackwardFilterWorkspaceSize () for dimensions 5 and greater, resulting in a CUDNN_STATUS_EXECUTION_FAILED error message. In cuDNN 7.4, the workaround for this issue is to calculate the workspace by using the formula below:

    Let M be the product of output tensor (gradDesc) dimensions starting at 1.
    Let N be the output tensor dimension 0.
    Let Mp = (M+31)/32
    Let Np = (N+31)/32
    W = 2 * M * N * sizeof(int) is the workspace that should be used.

    This is fixed.

  • In earlier cuDNN versions, when all the conditions below are true:
    • 3-D convolution
    • Batch size > 1
    • Algorithm is "CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1"
    • convDesc's dataType is CUDNN_DATA_HALF,

      then, calls to ​​cudnnConvolutionBackwardFilter() may produce incorrect (and non-deterministic) results. This is fixed in cuDNN 7.5.0.
  • In cuDNN 7.4.2, for some cases the 3D convolution resulted in a reduced performance on Turing GPUs, compared to the previous cuDNN releases. This is fixed.
  • For int8x32 datatype, the function cudnnSetTensor4dDescriptorEx erroneously returns CUDNN_STATUS_BAD_PARAM. Now it is fixed in cuDNN 7.5 so it no longer returns bad param.
  • In cuDNN 7.4.1 and 7.4.2, when cudnnBatchNormMode_t is set to CUDNN_BATCHNORM_SPATIAL_PERSISTENT and the input/output tensors are in NHWC format and of CUDNN_DATA_HALF datatype, then, on Windows only, the cudnnBatchNormalization*Ex functions are supported only with the device in TCC mode. See Tesla Compute Cluster Mode for Windows .

    Starting with cuDNN 7.5.0, the following checks are added for the driver mode on Windows. If on Windows and not in TCC mode:

    • The functions will fallback to a slower implementation if bnOps in the cudnnBatchNormalization*Ex function is set to CUDNN_BATCHNORM_OPS_BN.
    • If bnOps is set to CUDNN_BATCHNORM_OPS_BN_ACTIVATION, or CUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATION, the CUDNN_STATUS_NOT_SUPPORTED is returned.
  • In cuDNN 7.4.2, in some cases the cudnnConvolutionBackwardData() function, when used with NHWC tensor format, resulted in the “disallowed mismatches” error. This is fixed.
  • In some cases, using cudnnConvolutionBiasActivationForward() with GroupCount() > 1 and xDesc's data type is CUDNN_DATA_HALF will produce incorrect results for all groups except the first. This is fixed.
  • When using cuDNN 7.3.1 on Quadro P4000, when calling the cudnnConvolutionForward() function with CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED algorithm, there was a small chance of seeing intermittent inaccurate results. This is fixed.
  • When cudnnConvolutionForward() is called with these settings: Datatype is CUDNN_DATA_INT8x4, Convolution is 2D, architecture is sm_61, filter size is larger than 8x8, then incorrect result and potential illegal memory access error occurs. This is fixed.
  • For sm_72 and sm_75, the function cudnnConvolutionBiasActivationForward(), when used with INT8x32, failed to run. This is fixed.
  • In the function cudnnSetRNNDataDescriptor , if API logging is turned on, the seqLengthArray field in the log may not display the correct number of array elements. This is fixed.
  • For the batchNorm functions cudnnBatchNormalization{Backward|BackwardEx|ForwardInference|ForwardTraining|ForwardTrainingEx}, the value of epsilon is required to be greater or equal to CUDNN_BN_MIN_EPSILON which was defined in the cudnn.h file to the value 1e-5. This threshold value is now lowered to 0.0 to allow a wider range of epsilon value. However, users should still choose the epsilon value carefully, since a too small a value of epsilon may cause batchNormalization to overflow when the input data's standard deviation is close to 0.
  • Some Grouped Convolutions (particularly those used in Depthwise-Separable convolutions) may return INTERNAL_ERROR if they have all inputs/outputs as NHWC-packed and do not match one of the following criteria:
    • filter_height = 1, filter_width = 1, vertical_conv_stride = 1, horizontal_conv_stride = 1
    • filter_height = 3, filter_width = 3, vertical_conv_stride = 1, horizontal_conv_stride = 1
    • filter_height = 3, filter_width = 3, vertical_conv_stride = 2, horizontal_conv_stride = 2

Known Issues

The following issues and limitations exist in this release:

  • The RNN persist-static algorithm returns incorrect results for GRU problems in backwards mode, when the hidden size is greater than 1024. Due to this, RNN persist-static algorithm is disabled in cuDNN 7.5.0. Users with such GRU problems are advised to use the standard or persist-dynamic RNN algorithms. See cudnnRNNAlgo_t(). This note applies to all previous cuDNN 7 releases.
  • The function cudnnConvolutionBackwardFilter(), when used with CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1, returns the error "Uninitialized __global__ memory read of size 4".