This is the cuDNN 7.6.3 release notes. This release includes fixes from the previous
cuDNN v7.x.x releases as well as the following additional changes. These release notes are
applicable to both cuDNN and JetPack users unless appended specifically with (not
applicable for Jetson platforms).
Key Features and Enhancements
The following features and enhancements have been added to this release:
-
The cuDNN 7.6.3 library now supports auto-padding for NHWC layout. The functional behavior,
and the benefits of auto-padding as follows:
(not applicable for Jetson
platforms)
- For use cases where C and K dimensions of input and filter Tensors
are not multiples of 8, the auto-padding feature increases the
Tensor size so that the Tensor dimensions are multiples of 8.
- With auto-padding the cuDNN library invokes faster kernels, thereby
improving the performance.
- With auto-padding, the performance with NHWC data layout is now
comparable to that of the NCHW layout.
-
Added support for dataType=CUDNN_DATA_HALF and
computePrec=CUDNN_DATA_HALF in multi-head attention
forward (cudnnMultiHeadAttnForward()) and
backward (gradient) (cudnnMultiHeadAttnBackwardData()
and cudnnMultiHeadAttnBackwardWeights()) API functions. (not
applicable for Jetson platforms)
-
Multi-head attention API now supports bias after the projections on Q, K,
V, and O in the cudnnMultiHeadAttnForward() call
(backward bias gradient is not yet supported). (not applicable for Jetson
platforms)
The new feature required a small API change in cudnnSetAttnDescriptor(): the
cudnnAttnQueryMap_t queryMap argument is replaced with
unsigned attnMode to pass various on and off options.
This change is backward compatible with earlier API versions. (not
applicable for Jetson platforms)
-
Significantly improved the performance in typical multi-head attention use cases in forward
inference and training, especially when the vector length of each head is a
multiple of 32 up to 128. (not applicable for Jetson platforms)
-
Tensor Core support is added for true half and single precision use cases in multi-head
attention. Users may utilize it by setting the mathType
argument in cudnnSetAttnDescriptor() to
CUDNN_TENSOR_OP_MATH or
CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION. (not
applicable for Jetson platforms)
-
The multiHeadAttention sample code is added. The sample code includes a
compact NumPy/Autograd reference model of the multi-head attention block
that computes the forward response and all first-order derivatives. The test
code demonstrates how to use the multi-head attention API, access attention
weights, and sequence data. (not applicable for Jetson platforms)
-
Improved depth-wise convolution for forward,
dgrad, and
wgrad under the following conditions:
- Algorithm is algo1
- Tensor format for filter is NCHW (wgrad supports
NHWC also)
- Input and outputs are in FP16 and computation is in FP32
- Filter size: 1x1, 3x3, 5x5, 7x7 (dgrad only
supports stride 1)
- Math type is CUDNN_DEFAULT_MATH
-
Improved grouped convolution for
cudnnConvolutionBackwardFilter() in the configuration
below:
- Algorithm is
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
- Math type is CUDNN_DEFAULT_MATH
- Tensor format for filter is NCHW
- Input and outputs are in FP16 and computation is in FP32
- Filter size: 1x1, 3x3, 5x5, 7x7
-
Improved the performance of grouped convolution, for
cudnnConvolutionForward() in the configuration
below:
- Algorithm is
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
- Math type is CUDNN_TENSOR_OP_MATH or
CUDNN_TENSOROP_MATH_ALLOW_CONVERSION
- Tensor format for filter is NHWC
- Input and outputs are in FP16 and computation is in FP16/ FP32
- Per group C & K == 4/8/16/32
- Filter size: 3x3
-
Improved the performance of grouped convolution, for
cudnnConvolutionBackwardFilter() in the configuration
below:
- Algorithm is
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1
- Math type is CUDNN_TENSOR_OP_MATH or
CUDNN_TENSOROP_MATH_ALLOW_CONVERSION
- Tensor format for filter is NHWC
- Input and outputs are in FP16 and computation is in FP32
- On NVIDIA Volta (compute capability 7.0)
- Per group C & K == 4/8/16/32
- Filter size: 1x1, 3x3
Fixed Issues
The following issues have been
fixed in this release:
-
Fixed an issue where cudnnMultiHeadAttnBackwardData()
was producing incorrect results when K sequence length is longer than 32.
-
Fixed a race condition in cudnnMultiHeadAttnBackwardData()
that was producing intermittent incorrect results.
-
The function cudnnCTCLoss() produced incorrect
gradient result for label whose length is smaller than the maximal sequence
length in the batch. This is fixed in cuDNN 7.6.3.