## cuDNN Release Notes v7.3.0

## Key Features and Enhancements

The following enhancements have been added to this release:

- Support is added to the following for the dilated convolution, for
`NCHW`and`NHWC`filter formats:`cudnnConvolutionForward()`for 2D, CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM,`cudnnConvolutionBackwardData()`for 2D, CUDNN_CONVOLUTION_BWD_DATA_ALGO_1, and`cudnnConvolutionBackwardFilter()`for 2D, CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1

For these supported cases, the dilated convolution is expected to offer superior speed, compared to the existing dilated convolution with algo 0.

- Grouped convolutions for depth-wise separable convolutions are optimized for the following NHWC formats: HHH (input: Half, compute: Half, output: Half), HSH, and SSS.
- While using CUDNN_TENSOR_OP_MATH or CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION,
with the tensor cores, the
`c`and`k`dimensions of the tensors are now padded to multiples of 8 (as needed), to allow a tensor core kernel to run. - The CUDNN_BATCHNORM_SPATIAL_PERSISTENT algo is enhanced in
`cudnnBatchNormalizationForwardTraining()`and`cudnnBatchNormalizationBackward()`to propagate NaN-s or Inf-s as in a pure floating point implementation (the "persistent" flavor of the batch normalization is optimized for speed and it uses integer atomics for inter thread-block reductions). In earlier versions of cuDNN we recommended invoking`cudnnQueryRuntimeError()`to ensure no overflow was encountered. When it happened, the best practice was to discard the results, and use CUDNN_BATCHNORM_SPATIAL instead, as some results generated by CUDNN_BATCHNORM_SPATIAL_PERSISTENT could be finite but invalid. This behavior is now corrected: NaN-s and/or Inf-s are consistently output when intermediate results are out of range. The refined implementation simulates math operations on special floating point values, for example, +Inf-Inf=NaN.

## Known Issues and Limitations

Following issues and limitations exist in this release:

- When tensor cores are enabled in cuDNN 7.3.0, the wgrad calculations will perform an illegal memory access when K and C values are both non-integral multiples of 8. This will not likely produce incorrect results, but may corrupt other memory depending on the user buffer locations. This issue is present on Volta & Turing architectures.
- Using
`cudnnGetConvolution*_v7`routines with`cudnnConvolutionDescriptor_t`set to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION leads to incorrect outputs. These incorrect outputs will consist only of CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION cases, instead of also returning the performance results for both DEFAULT_MATH and CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION cases.

## Fixed Issues

The following issues have been fixed in this release:

- Using
`cudnnConvolutionBackwardData()`with CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD algorithm produced incorrect results due to an incorrect filter transform. This issue was present in cuDNN 7.2.1. - For INT8 type, with
`xDesc`and`yDesc`of NHWC format, the`cudnnGetConvolutionForwardAlgorithm_v7`function was incorrectly returning CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM as a valid algorithm. This is fixed. `cudnnConvolutionForward()`using CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD intermittently produced incorrect results in cuDNN 7.2, due to a race condition. This issue is fixed.- When running
`cudnnConvolutionBackwardFilter()`with NHWC filter format, when`n`,`c`, and`k`are all multiple of 8, and when the`workSpace`input is exactly as indicated by`cudnnGetConvolutionBackwardFilterWorkspaceSize()`, leads to error in cuDNN 7.2. This is fixed. - When the user runs
`cudnnRNNForward`* or`cudnnRNNBackward`* with FP32 input/output on sm_70 or sm_72, with RNN descriptor's`algo`field set to CUDNN_RNN_ALGO_PERSIST_STATIC, and`cudnnMathType_t`type set to CUDNN_TENSOR_OP_MATH via`cudnnSetRNNMatrixMathType`, then the results were incorrect. This is fixed. - When the user runs
`cudnnRNNForward`* or`cudnnRNNBackward`* with FP32 input/output on sm_70 or sm_72, with RNN descriptor's`algo`field set to CUDNN_RNN_ALGO_PERSIST_STATIC, and`cudnnMathType_t`type set to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION via`cudnnSetRNNMatrixMathType`, then the resulting performance was suboptimal. This is fixed. - Convolution routines with filter format as NHWC require both input and output formats to be NHWC. However, in cuDNN 7.2 and earlier, this condition was not being checked for, as a result of which silent failures may have occurred. This is fixed in 7.3.0 to correctly return CUDNN_STATUS_NOT_SUPPORTED.