cuDNN Release Notes v7.6.3

Key Features and Enhancements

The following features and enhancements have been added to this release:

  • The cuDNN 7.6.3 library now supports auto-padding for NHWC layout. The functional behavior, and the benefits of auto-padding as follows:
    • For use cases where C and K dimensions of input and filter Tensors are not multiples of 8, the auto-padding feature increases the Tensor size so that the Tensor dimensions are multiples of 8.
    • With auto-padding the cuDNN library invokes faster kernels, thereby improving the performance.
    • With auto-padding, the performance with NHWC data layout is now comparable to that of the NCHW layout.
  • Added support for dataType=CUDNN_DATA_HALF and computePrec=CUDNN_DATA_HALF in multi-head attention forward (cudnnMultiHeadAttnForward) and backward (gradient) (cudnnMultiHeadAttnBackwardData and cudnnMultiHeadAttnBackwardWeights) API functions.
  • Multi-head attention API now supports bias after the projections on Q, K, V, and O in the cudnnMultiHeadAttnForward() call (backward bias gradient is not yet supported).

    The new feature required a small API change in cudnnSetAttnDescriptor(): the cudnnAttnQueryMap_t queryMap argument is replaced with unsigned attnMode to pass various on and off options. This change is backward compatible with earlier API versions.

  • Significantly improved the performance in typical multi-head attention use cases in forward inference and training, especially when the vector length of each head is a multiple of 32 up to 128.
  • Tensor Core support is added for true half and single precision use cases in multi-head attention. Users may utilize it by setting the mathType argument in cudnnSetAttnDescriptor() to CUDNN_TENSOR_OP_MATH or CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION.
  • The multiHeadAttention sample code is added. The sample code includes a compact NumPy/Autograd reference model of the multi-head attention block that computes the forward response and all first-order derivatives. The test code demonstrates how to use the multi-head attention API, access attention weights, and sequence data.

Fixed Issues

The following issues have been fixed in this release:

  • Fixed an issue where cudnnMultiHeadAttnBackwardData was producing incorrect results when K sequence length is longer than 32.
  • Fixed a race condition in cudnnMultiHeadAttnBackwardData that was producing intermittent incorrect results.
  • The function cudnnCTCLoss() produced incorrect gradient result for label whose length is smaller than the maximal sequence length in the batch. This is fixed in cuDNN 7.6.3.