Key Features and Enhancements
The following features and enhancements have been added to this release:
- The cuDNN 7.6.3 library now supports auto-padding for NHWC layout. The functional behavior,
and the benefits of auto-padding as follows:
- For use cases where C and K dimensions of input and filter Tensors are not
multiples of 8, the auto-padding feature increases the Tensor size so that the
Tensor dimensions are multiples of 8.
- With auto-padding the cuDNN library invokes faster kernels, thereby improving
the performance.
- With auto-padding, the performance with NHWC data layout is now comparable to
that of the NCHW layout.
- Added support for dataType=CUDNN_DATA_HALF and
computePrec=CUDNN_DATA_HALF in multi-head attention forward
(cudnnMultiHeadAttnForward) and backward
(gradient) (cudnnMultiHeadAttnBackwardData and cudnnMultiHeadAttnBackwardWeights) API
functions.
-
Multi-head
attention API now supports bias after the projections on Q, K, V, and O in the
cudnnMultiHeadAttnForward() call
(backward bias gradient is not yet supported).
The new feature required a small API change in cudnnSetAttnDescriptor(): the
cudnnAttnQueryMap_t queryMap argument is replaced with
unsigned attnMode to pass various on and off options. This
change is backward compatible with earlier API versions.
- Significantly improved the performance in typical multi-head attention use cases in
forward inference and training, especially when the vector length of each head is a
multiple of 32 up to 128.
- Tensor Core support is added for true half and single precision use cases in
multi-head attention. Users may utilize it by setting the mathType argument in cudnnSetAttnDescriptor() to
CUDNN_TENSOR_OP_MATH or CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION.
- The multiHeadAttention sample code is added. The sample code includes a compact
NumPy/Autograd reference model of the multi-head attention block that computes the
forward response and all first-order derivatives. The test code demonstrates how to
use the multi-head attention API, access attention weights, and sequence data.
Fixed Issues
The following issues have been
fixed in this release:
- Fixed an issue where cudnnMultiHeadAttnBackwardData was producing incorrect
results when K sequence length is longer than 32.
- Fixed a race condition in cudnnMultiHeadAttnBackwardData that was
producing intermittent incorrect results.
- The function cudnnCTCLoss() produced incorrect gradient
result for label whose length is smaller than the maximal sequence length in the
batch. This is fixed in cuDNN 7.6.3.