Key Features and Enhancements
The following enhancements have been added to this release:
- Support is added to the following for the dilated convolution, for
NCHW
andNHWC
filter formats:cudnnConvolutionForward()
for 2D, CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM,cudnnConvolutionBackwardData()
for 2D, CUDNN_CONVOLUTION_BWD_DATA_ALGO_1, andcudnnConvolutionBackwardFilter()
for 2D, CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
For these supported cases, the dilated convolution is expected to offer superior speed, compared to the existing dilated convolution with algo 0.
- Grouped convolutions for depth-wise separable convolutions are optimized for the following NHWC formats: HHH (input: Half, compute: Half, output: Half), HSH, and SSS.
- While using CUDNN_TENSOR_OP_MATH or CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION, with the tensor cores, the
c
andk
dimensions of the tensors are now padded to multiples of 8 (as needed), to allow a tensor core kernel to run. - The CUDNN_BATCHNORM_SPATIAL_PERSISTENT algo is enhanced in
cudnnBatchNormalizationForwardTraining()
andcudnnBatchNormalizationBackward()
to propagate NaN-s or Inf-s as in a pure floating point implementation (the "persistent" flavor of the batch normalization is optimized for speed and it uses integer atomics for inter thread-block reductions). In earlier versions of cuDNN we recommended invokingcudnnQueryRuntimeError()
to ensure no overflow was encountered. When it happened, the best practice was to discard the results, and use CUDNN_BATCHNORM_SPATIAL instead, as some results generated by CUDNN_BATCHNORM_SPATIAL_PERSISTENT could be finite but invalid. This behavior is now corrected: NaN-s and/or Inf-s are consistently output when intermediate results are out of range. The refined implementation simulates math operations on special floating point values, for example, +Inf-Inf=NaN.
Known Issues and Limitations
Following issues and limitations exist in this release:
- When tensor cores are enabled in cuDNN 7.3.0, the wgrad calculations will perform an illegal memory access when K and C values are both non-integral multiples of 8. This will not likely produce incorrect results, but may corrupt other memory depending on the user buffer locations. This issue is present on Volta & Turing architectures.
- Using
cudnnGetConvolution*_v7
routines withcudnnConvolutionDescriptor_t
set to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION leads to incorrect outputs. These incorrect outputs will consist only of CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION cases, instead of also returning the performance results for both DEFAULT_MATH and CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION cases.
Fixed Issues
The following issues have been fixed in this release:
- Using
cudnnConvolutionBackwardData()
with CUDNN_CONVOLUTION_BWD_DATA_ALGO_WINOGRAD algorithm produced incorrect results due to an incorrect filter transform. This issue was present in cuDNN 7.2.1. - For INT8 type, with
xDesc
andyDesc
of NHWC format, thecudnnGetConvolutionForwardAlgorithm_v7
function was incorrectly returning CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM as a valid algorithm. This is fixed. cudnnConvolutionForward()
using CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD intermittently produced incorrect results in cuDNN 7.2, due to a race condition. This issue is fixed.- When running
cudnnConvolutionBackwardFilter()
with NHWC filter format, whenn
,c
, andk
are all multiple of 8, and when theworkSpace
input is exactly as indicated bycudnnGetConvolutionBackwardFilterWorkspaceSize()
, leads to error in cuDNN 7.2. This is fixed. - When the user runs
cudnnRNNForward
* orcudnnRNNBackward
* with FP32 input/output on sm_70 or sm_72, with RNN descriptor'salgo
field set to CUDNN_RNN_ALGO_PERSIST_STATIC, andcudnnMathType_t
type set to CUDNN_TENSOR_OP_MATH viacudnnSetRNNMatrixMathType
, then the results were incorrect. This is fixed. - When the user runs
cudnnRNNForward
* orcudnnRNNBackward
* with FP32 input/output on sm_70 or sm_72, with RNN descriptor'salgo
field set to CUDNN_RNN_ALGO_PERSIST_STATIC, andcudnnMathType_t
type set to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION viacudnnSetRNNMatrixMathType
, then the resulting performance was suboptimal. This is fixed. - Convolution routines with filter format as NHWC require both input and output formats to be NHWC. However, in cuDNN 7.2 and earlier, this condition was not being checked for, as a result of which silent failures may have occurred. This is fixed in 7.3.0 to correctly return CUDNN_STATUS_NOT_SUPPORTED.