Key Features and Enhancements
The following features and enhancements have been added to this release:
- In cudnnConvolutionForward() for 2D convolutions, for
wDesc
NCHW, the IMPLICIT_GEMM algorithm (algo 0) now supports the Data Type Configuration of INT8x4_CONFIG, and INT8x4_EXT_CONFIG also. -
A new set of APIs are added to provide support for Multi-Head Attention computation. The following is a list of the new functions and data types:
Datatypes:
- cudnnSeqDataAxis_t
- cudnnMultiHeadAttnWeightKind_t
- cudnnSeqDataDescriptor_t
- cudnnWgradMode_t
- cudnnAttnQueryMap_t
- cudnnAttnDescriptor_t
Functions:
- cudnnCreateAttnDescriptor
- cudnnDestroyAttnDescriptor
- cudnnSetAttnDescriptor
- cudnnGetAttnDescriptor
- cudnnGetMultiHeadAttnBuffers
- cudnnGetMultiHeadAttnWeights
- cudnnMultiHeadAttnForward
- cudnnMultiHeadAttnBackwardData
- cudnnMultiHeadAttnBackwardWeights
- cudnnSetSeqDataDescriptor
- cudnnGetSeqDataDescriptor
- cudnnCreateSeqDataDescriptor
- cudnnDestroySeqDataDescriptor
-
A new set of APIs for general tensor folding is introduced. The following is a list of the new functions and data types:
Datatypes:
- cudnnTensorTransformDescriptor_t
- cudnnFoldingDirection_t
Functions:
- cudnnTransformTensorEx
- cudnnCreateTensorTransformDescriptor
- cudnnDestroyTensorTransformDescriptor
- cudnnInitTransformDest
- cudnnSetTensorTransformDescriptor
- cudnnGetTensorTransformDescriptor
-
A new set of APIs, and enhancements for the existing APIs, are introduced for RNNs. The following is the list of the new and enhanced functions and data types:
Datatypes:
- cudnnRNNBiasMode_t (new)
- cudnnRNNMode_t (enhanced)
Functions:
- cudnnSetRNNBiasMode (new)
- cudnnGetRNNBiasMode (new)
- cudnnGetRNNLinLayerBiasParams (enhanced)
- All
cudnnRNNForward/Backward*
functions are enhanced to support FP16 math precision mode when both input and output are in FP16. To switch to FP16 math precision, set themathPrec
parameter incudnnSetRNNDescriptor
to CUDNN_DATA_HALF. To switch to FP32 math precision, set themathPrec
parameter incudnnSetRNNDescriptor
to CUDNN_DATA_FLOAT. This feature is only available for CUDNN_ALGO_STANDARD and for the compute capability 5.3 or higher. - Added support for INT8x4 and INT8x32 data type for cudnnPoolingForward. Using these will provide improved performance over scalar data type.
Fixed Issues
The following issues have been fixed in this release:
- The
mathPrec
parameter incudnnSetRNNDescriptor
is reserved for controlling math precision in RNN, but was not checked or enforced. This parameter is now strictly enforced. As a result, the following applies:- For the input/output in FP16, the parameter
mathPrec
can be CUDNN_DATA_HALF or CUDNN_DATA_FLOAT. - For the input/output in FP32, the parameter
mathPrec
can only be CUDNN_DATA_FLOAT, and - For the input/output in FP64, double type, the parameter
mathPrec
can only be CUDNN_DATA_DOUBLE.
- For the input/output in FP16, the parameter
-
Users upgrading to cuDNN 7.4 may see insufficiently small values returned from the function cudnnGetConvolutionBackwardFilterWorkspaceSize () for dimensions 5 and greater, resulting in a CUDNN_STATUS_EXECUTION_FAILED error message. In cuDNN 7.4, the workaround for this issue is to calculate the workspace by using the formula below:
CopyCopied!Let M be the product of output tensor (gradDesc) dimensions starting at 1. Let N be the output tensor dimension 0. Let Mp = (M+31)/32 Let Np = (N+31)/32 W = 2 * M * N * sizeof(int) is the workspace that should be used.
This is fixed.
- In earlier cuDNN versions, when all the conditions below are true:
- 3-D convolution
- Batch size > 1
- Algorithm is "CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1"
convDesc
'sdataType
is CUDNN_DATA_HALF, then, calls to cudnnConvolutionBackwardFilter()
may produce incorrect (and non-deterministic) results. This is fixed in cuDNN 7.5.0.
- In cuDNN 7.4.2, for some cases the 3D convolution resulted in a reduced performance on Turing GPUs, compared to the previous cuDNN releases. This is fixed.
- For
int8x32
datatype, the function cudnnSetTensor4dDescriptorEx erroneously returns CUDNN_STATUS_BAD_PARAM. Now it is fixed in cuDNN 7.5 so it no longer returns bad param. -
In cuDNN 7.4.1 and 7.4.2, when cudnnBatchNormMode_t is set to CUDNN_BATCHNORM_SPATIAL_PERSISTENT and the input/output tensors are in NHWC format and of CUDNN_DATA_HALF datatype, then, on Windows only, the
cudnnBatchNormalization*Ex
functions are supported only with the device in TCC mode. See Tesla Compute Cluster Mode for Windows .Starting with cuDNN 7.5.0, the following checks are added for the driver mode on Windows. If on Windows and not in TCC mode:
- The functions will fallback to a slower implementation if
bnOps
in thecudnnBatchNormalization*Ex
function is set to CUDNN_BATCHNORM_OPS_BN. - If
bnOps
is set to CUDNN_BATCHNORM_OPS_BN_ACTIVATION, or CUDNN_BATCHNORM_OPS_BN_ADD_ACTIVATION, the CUDNN_STATUS_NOT_SUPPORTED is returned.
- The functions will fallback to a slower implementation if
- In cuDNN 7.4.2, in some cases the cudnnConvolutionBackwardData() function, when used with NHWC tensor format, resulted in the “disallowed mismatches” error. This is fixed.
- In some cases, using cudnnConvolutionBiasActivationForward() with
GroupCount() > 1
andxDesc
's data type is CUDNN_DATA_HALF will produce incorrect results for all groups except the first. This is fixed. - When using cuDNN 7.3.1 on Quadro P4000, when calling the cudnnConvolutionForward() function with CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED algorithm, there was a small chance of seeing intermittent inaccurate results. This is fixed.
- When cudnnConvolutionForward() is called with these settings: Datatype is CUDNN_DATA_INT8x4, Convolution is 2D, architecture is
sm_61
, filter size is larger than 8x8, then incorrect result and potential illegal memory access error occurs. This is fixed. - For
sm_72
andsm_75
, the function cudnnConvolutionBiasActivationForward(), when used with INT8x32, failed to run. This is fixed. - In the function
cudnnSetRNNDataDescriptor
, if API logging is turned on, theseqLengthArray
field in the log may not display the correct number of array elements. This is fixed. - For the batchNorm functions
cudnnBatchNormalization{Backward|BackwardEx|ForwardInference|ForwardTraining|ForwardTrainingEx}
, the value ofepsilon
is required to be greater or equal to CUDNN_BN_MIN_EPSILON which was defined in thecudnn.h
file to the value 1e-5. This threshold value is now lowered to 0.0 to allow a wider range ofepsilon
value. However, users should still choose theepsilon
value carefully, since a too small a value ofepsilon
may cause batchNormalization to overflow when the input data's standard deviation is close to 0. - Some Grouped Convolutions (particularly those used in Depthwise-Separable convolutions) may return INTERNAL_ERROR if they have all inputs/outputs as NHWC-packed and do not match one of the following criteria:
- filter_height = 1, filter_width = 1, vertical_conv_stride = 1, horizontal_conv_stride = 1
- filter_height = 3, filter_width = 3, vertical_conv_stride = 1, horizontal_conv_stride = 1
- filter_height = 3, filter_width = 3, vertical_conv_stride = 2, horizontal_conv_stride = 2
Known Issues
The following issues and limitations exist in this release:
- The RNN persist-static algorithm returns incorrect results for GRU problems in backwards mode, when the hidden size is greater than 1024. Due to this, RNN persist-static algorithm is disabled in cuDNN 7.5.0. Users with such GRU problems are advised to use the standard or persist-dynamic RNN algorithms. See cudnnRNNAlgo_t(). This note applies to all previous cuDNN 7 releases.
- The function cudnnConvolutionBackwardFilter(), when used with CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1, returns the error
"Uninitialized __global__ memory read of size 4".