cuDNN Release Notes v7.3.0

Key Features and Enhancements

The following enhancements have been added to this release:

  • Support is added to the following for the dilated convolution, for NCHW and NHWC filter formats:
    • cudnnConvolutionForward() for 2D, CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM,
    • cudnnConvolutionBackwardData() for 2D, CUDNN_CONVOLUTION_BWD_DATA_ALGO_1, and
    • cudnnConvolutionBackwardFilter() for 2D, CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1

    For these supported cases, the dilated convolution is expected to offer superior speed, compared to the existing dilated convolution with algo 0.

  • Grouped convolutions for depth-wise separable convolutions are optimized for the following NHWC formats: HHH (input: Half, compute: Half, output: Half), HSH, and SSS.
  • While using CUDNN_TENSOR_OP_MATH or CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION, with the tensor cores, the c and k dimensions of the tensors are now padded to multiples of 8 (as needed), to allow a tensor core kernel to run.
  • The CUDNN_BATCHNORM_SPATIAL_PERSISTENT algo is enhanced in cudnnBatchNormalizationForwardTraining() and cudnnBatchNormalizationBackward() to propagate NaN-s or Inf-s as in a pure floating point implementation (the "persistent" flavor of the batch normalization is optimized for speed and it uses integer atomics for inter thread-block reductions). In earlier versions of cuDNN we recommended invoking cudnnQueryRuntimeError() to ensure no overflow was encountered. When it happened, the best practice was to discard the results, and use CUDNN_BATCHNORM_SPATIAL instead, as some results generated by CUDNN_BATCHNORM_SPATIAL_PERSISTENT could be finite but invalid. This behavior is now corrected: NaN-s and/or Inf-s are consistently output when intermediate results are out of range. The refined implementation simulates math operations on special floating point values, for example, +Inf-Inf=NaN.

Known Issues and Limitations

Following issues and limitations exist in this release:

  • When tensor cores are enabled in cuDNN 7.3.0, the wgrad calculations will perform an illegal memory access when K and C values are both non-integral multiples of 8. This will not likely produce incorrect results, but may corrupt other memory depending on the user buffer locations. This issue is present on Volta & Turing architectures.
  • Using cudnnGetConvolution*_v7 routines with cudnnConvolutionDescriptor_t set to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION leads to incorrect outputs. These incorrect outputs will consist only of CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION cases, instead of also returning the performance results for both DEFAULT_MATH and CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION cases.

Fixed Issues

The following issues have been fixed in this release:

  • Using cudnnConvolutionBackwardData() with CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD algorithm produced incorrect results due to an incorrect filter transform. This issue was present in cuDNN 7.2.1.
  • For INT8 type, with xDesc and yDesc of NHWC format, the cudnnGetConvolutionForwardAlgorithm_v7 function was incorrectly returning CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM as a valid algorithm. This is fixed.
  • cudnnConvolutionForward() using CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD intermittently produced incorrect results in cuDNN 7.2, due to a race condition. This issue is fixed.
  • When running cudnnConvolutionBackwardFilter() with NHWC filter format, when n, c, and k are all multiple of 8, and when the workSpace input is exactly as indicated by cudnnGetConvolutionBackwardFilterWorkspaceSize(), leads to error in cuDNN 7.2. This is fixed.
  • When the user runs cudnnRNNForward* or cudnnRNNBackward* with FP32 input/output on sm_70 or sm_72, with RNN descriptor's algo field set to CUDNN_RNN_ALGO_PERSIST_STATIC, and cudnnMathType_t type set to CUDNN_TENSOR_OP_MATH via cudnnSetRNNMatrixMathType, then the results were incorrect. This is fixed.
  • When the user runs cudnnRNNForward* or cudnnRNNBackward* with FP32 input/output on sm_70 or sm_72, with RNN descriptor's algo field set to CUDNN_RNN_ALGO_PERSIST_STATIC, and cudnnMathType_t type set to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION via cudnnSetRNNMatrixMathType, then the resulting performance was suboptimal. This is fixed.
  • Convolution routines with filter format as NHWC require both input and output formats to be NHWC. However, in cuDNN 7.2 and earlier, this condition was not being checked for, as a result of which silent failures may have occurred. This is fixed in 7.3.0 to correctly return CUDNN_STATUS_NOT_SUPPORTED.