cuDNN Release Notes v7.6.3
Key Features and Enhancements
The following features and enhancements have been added to this release:
- The cuDNN 7.6.3 library now supports auto-padding for NHWC layout. The functional behavior,
and the benefits of auto-padding as follows:
- For use cases where C and K dimensions of input and filter Tensors are not multiples of 8, the auto-padding feature increases the Tensor size so that the Tensor dimensions are multiples of 8.
- With auto-padding the cuDNN library invokes faster kernels, thereby improving the performance.
- With auto-padding, the performance with NHWC data layout is now comparable to that of the NCHW layout.
- Added support for dataType=CUDNN_DATA_HALF and computePrec=CUDNN_DATA_HALF in multi-head attention forward (cudnnMultiHeadAttnForward) and backward (gradient) (cudnnMultiHeadAttnBackwardData and cudnnMultiHeadAttnBackwardWeights) API functions.
-
Multi-head attention API now supports bias after the projections on Q, K, V, and O in the cudnnMultiHeadAttnForward() call (backward bias gradient is not yet supported).
The new feature required a small API change in cudnnSetAttnDescriptor(): the cudnnAttnQueryMap_t queryMap argument is replaced with unsigned attnMode to pass various on and off options. This change is backward compatible with earlier API versions.
- Significantly improved the performance in typical multi-head attention use cases in forward inference and training, especially when the vector length of each head is a multiple of 32 up to 128.
- Tensor Core support is added for true half and single precision use cases in multi-head attention. Users may utilize it by setting the mathType argument in cudnnSetAttnDescriptor() to CUDNN_TENSOR_OP_MATH or CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION.
- The multiHeadAttention sample code is added. The sample code includes a compact NumPy/Autograd reference model of the multi-head attention block that computes the forward response and all first-order derivatives. The test code demonstrates how to use the multi-head attention API, access attention weights, and sequence data.
- Improved depth-wise convolution for forward, dgrad, and wgrad under the following
conditions:
- Algorithm is algo1
- Tensor format for filter is NCHW (wgrad supports NHWC also)
- Input and outputs are in FP16 and computation is in FP32
- Filter size: 1x1, 3x3, 5x5, 7x7 (dgrad only supports stride 1)
- Math type is CUDNN_DEFAULT_MATH
- Improved grouped convolution for cudnnConvolutionBackwardFilter()
in the configuration below:
- Algorithm is CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
- Math type is CUDNN_DEFAULT_MATH
- Tensor format for filter is NCHW
- Input and outputs are in FP16 and computation is in FP32
- Filter size: 1x1, 3x3, 5x5, 7x7
- Improved the performance of grouped convolution, for
cudnnConvolutionForward() in the configuration below:
- Algorithm is CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
- Math type is CUDNN_TENSOR_OP_MATH or CUDNN_TENSOROP_MATH_ALLOW_CONVERSION
- Tensor format for filter is NHWC
- Input and outputs are in FP16 and computation is in FP16/ FP32
- Per group C & K == 4/8/16/32
- Filter size: 3x3
- Improved the performance of grouped convolution, for
cudnnConvolutionBackwardFilter() in the configuration below:
- Algorithm is CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
- Math type is CUDNN_TENSOR_OP_MATH or CUDNN_TENSOROP_MATH_ALLOW_CONVERSION
- Tensor format for filter is NHWC
- Input and outputs are in FP16 and computation is in FP32
- On NVIDIA Volta (compute capability 7.0)
- Per group C & K == 4/8/16/32
- Filter size: 1x1, 3x3
Fixed Issues
The following issues have been fixed in this release:
- Fixed an issue where cudnnMultiHeadAttnBackwardData was producing incorrect results when K sequence length is longer than 32.
- Fixed a race condition in cudnnMultiHeadAttnBackwardData that was producing intermittent incorrect results.
- The function cudnnCTCLoss() produced incorrect gradient result for label whose length is smaller than the maximal sequence length in the batch. This is fixed in cuDNN 7.6.3.