cuDNN v7.0.1 is the first release to support the Volta GPU architecture. In addition, cuDNN v7.0.1 brings new layers, grouped convolutions, and improved convolution find as error query mechanism.
Key Features and Enhancements
This cuDNN release includes the following key features and enhancements.
- Tensor Cores
- Version 7.0.1 of cuDNN is the first to support the Tensor Core operations in its implementation. Tensor Cores provide highly optimized matrix multiplication building blocks that do not have an equivalent numerical behavior in the traditional instructions, therefore, its numerical behavior is slightly different.
-
cudnnSetConvolutionMathType
,cudnnSetRNNMatrixMathType
, andcudnnMathType_t
-
The
cudnnSetConvolutionMathType
andcudnnSetRNNMatrixMathType
functions enable you to choose whether or not to use Tensor Core operations in the convolution and RNN layers respectively by setting the math mode to eitherCUDNN_TENSOR_OP_MATH
orCUDNN_DEFAULT_MATH
.Tensor Core operations perform parallel floating point accumulation of multiple floating point products.
Setting the math mode to
CUDNN_TENSOR_OP_MATH
indicates that the library will use Tensor Core operations.The default is
CUDNN_DEFAULT_MATH
. This default indicates that the Tensor Core operations will be avoided by the library. The default mode is a serialized operation whereas, the Tensor Core is a parallelized operation, therefore, the two might result in slightly different numerical results due to the different sequencing of operations.Note:The library falls back to the default math mode when Tensor Core operations are not supported or not permitted.
-
cudnnSetConvolutionGroupCount
- A new interface that allows applications to perform convolution groups in the convolution layers in a single API call.
-
cudnnCTCLoss
-
cudnnCTCLoss
provides a GPU implementation of the Connectionist Temporal Classification (CTC) loss function for RNNs. The CTC loss function is used for phoneme recognition in speech and handwriting recognition. -
CUDNN_BATCHNORM_SPATIAL_PERSISTENT
-
The
CUDNN_BATCHNORM_SPATIAL_PERSISTENT
function is a new batch normalization mode forcudnnBatchNormalizationForwardTraining
andcudnnBatchNormalizationBackward
. This mode is similar toCUDNN_BATCHNORM_SPATIAL
, however, it can be faster for some tasks. -
cudnnQueryRuntimeError
-
The
cudnnQueryRuntimeError
function reports error codes written by GPU kernels when executingcudnnBatchNormalizationForwardTraining
andcudnnBatchNormalizationBackward
with theCUDNN_BATCHNORM_SPATIAL_PERSISTENT
mode. -
cudnnGetConvolutionForwardAlgorithm_v7
-
This new API returns all algorithms sorted by expected performance (using internal heuristics). These algorithms are output similarly to
cudnnFindConvolutionForwardAlgorithm
. -
cudnnGetConvolutionBackwardDataAlgorithm_v7
-
This new API returns all algorithms sorted by expected performance (using internal heuristics). These algorithms are output similarly to
cudnnFindConvolutionBackwardAlgorithm
. -
cudnnGetConvolutionBackwardFilterAlgorithm_v7
-
This new API returns all algorithms sorted by expected performance (using internal heuristics). These algorithms are output similarly to
cudnnFindConvolutionBackwardFilterAlgorithm
. -
CUDNN_REDUCE_TENSOR_MUL_NO_ZEROS
-
The
MUL_NO_ZEROS
function is a multiplication reduction that ignores zeros in the data. -
CUDNN_OP_TENSOR_NOT
-
The
OP_TENSOR_NOT
function is a unary operation that takes the negative of (alpha*A). -
cudnnGetDropoutDescriptor
-
The
cudnnGetDropoutDescriptor
function allows applications to get dropout values.
Using cuDNN v7.0.1
Ensure you are familiar with the following notes when using this release.
- Multi-threading behavior has been modified. Multi-threading is allowed only when using different cuDNN handles in different threads.
- In
cudnnConvolutionBackwardFilter
, dilated convolution did not support cases where the product of all filter dimensions was odd for half precision floating point. These are now supported byCUDNN_CONVOLUTION_BWD_FILTER_ALGO1
. - Fixed bug that produced a silent computation error for when a batch size was larger than 65536 for
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
. - In
getConvolutionForwardAlgorithm
, an error was not correctly reported in v5 when the output size was larger than expected. In v6 theCUDNN_STATUS_NOT_SUPPORTED
, error message displayed. In v7, this error is modified toCUDNN_STATUS_BAD_PARAM
. - In
cudnnConvolutionBackwardFilter
, cuDNN now runs some exceptional cases correctly where it previously erroneously returnedCUDNN_STATUS_NOT_SUPPORTED
. This impacted the algorithmsCUDNN_CONVOLUTION_BWD_FILTER_ALGO0
andCUDNN_CONVOLUTION_BWD_FILTER_ALGO3
.
Deprecated Features
The following routines have been removed:
cudnnSetConvolution2dDescriptor_v4
cudnnSetConvolution2dDescriptor_v5
cudnnGetConvolution2dDescriptor_v4
cudnnGetConvolution2dDescriptor_v5
Only the non-suffixed versions of these routines remain.
The following routines have been created and have the same API prototype as their non-suffixed equivalent from cuDNN v6:
cudnnSetRNNDescriptor_v5
- The non-suffixed version of the routines in cuDNN v7.0.1 are now mapped to their_v6
equivalent.Attention:It is strongly advised to use the non-suffixed version as the
_v5
and_v6
routines will be removed in the next cuDNN release.cudnnGetConvolutionForwardAlgorithm
,cudnnGetConvolutionBackwardDataAlgorithm
, andcudnnGetConvolutionBackwardFilterAlgorithm
- A_v7
version of this routine has been created. For more information, see the Backward compatibility and deprecation policy chapter of the cuDNN documentation for details.
Known Issues
- cuDNN pooling backwards fails for pooling window size > 256.