cuDNN Release Notes v7.0.1

cuDNN v7.0.1 is the first release to support the Volta GPU architecture. In addition, cuDNN v7.0.1 brings new layers, grouped convolutions, and improved convolution find as error query mechanism.

Key Features and Enhancements

This cuDNN release includes the following key features and enhancements.

Tensor Cores
Version 7.0.1 of cuDNN is the first to support the Tensor Core operations in its implementation. Tensor Cores provide highly optimized matrix multiplication building blocks that do not have an equivalent numerical behavior in the traditional instructions, therefore, its numerical behavior is slightly different.

cudnnSetConvolutionMathType, cudnnSetRNNMatrixMathType, and cudnnMathType_t
The cudnnSetConvolutionMathType and cudnnSetRNNMatrixMathType functions enable you to choose whether or not to use Tensor Core operations in the convolution and RNN layers respectively by setting the math mode to either CUDNN_TENSOR_OP_MATH or CUDNN_DEFAULT_MATH.

Tensor Core operations perform parallel floating point accumulation of multiple floating point products.

Setting the math mode to CUDNN_TENSOR_OP_MATH indicates that the library will use Tensor Core operations.

The default is CUDNN_DEFAULT_MATH. This default indicates that the Tensor Core operations will be avoided by the library. The default mode is a serialized operation whereas, the Tensor Core is a parallelized operation, therefore, the two might result in slightly different numerical results due to the different sequencing of operations.
Note: The library falls back to the default math mode when Tensor Core operations are not supported or not permitted.

cudnnSetConvolutionGroupCount
A new interface that allows applications to perform convolution groups in the convolution layers in a single API call.

cudnnCTCLoss
cudnnCTCLoss provides a GPU implementation of the Connectionist Temporal Classification (CTC) loss function for RNNs. The CTC loss function is used for phoneme recognition in speech and handwriting recognition.

CUDNN_BATCHNORM_SPATIAL_PERSISTENT
The CUDNN_BATCHNORM_SPATIAL_PERSISTENT function is a new batch normalization mode for cudnnBatchNormalizationForwardTraining and cudnnBatchNormalizationBackward. This mode is similar to CUDNN_BATCHNORM_SPATIAL, however, it can be faster for some tasks.

cudnnQueryRuntimeError
The cudnnQueryRuntimeError function reports error codes written by GPU kernels when executing cudnnBatchNormalizationForwardTraining and cudnnBatchNormalizationBackward with the CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode.

cudnnGetConvolutionForwardAlgorithm_v7
This new API returns all algorithms sorted by expected performance (using internal heuristics). These algorithms are output similarly to cudnnFindConvolutionForwardAlgorithm.

cudnnGetConvolutionBackwardDataAlgorithm_v7
This new API returns all algorithms sorted by expected performance (using internal heuristics). These algorithms are output similarly to cudnnFindConvolutionBackwardAlgorithm.

cudnnGetConvolutionBackwardFilterAlgorithm_v7
This new API returns all algorithms sorted by expected performance (using internal heuristics). These algorithms are output similarly to cudnnFindConvolutionBackwardFilterAlgorithm.

CUDNN_REDUCE_TENSOR_MUL_NO_ZEROS
The MUL_NO_ZEROS function is a multiplication reduction that ignores zeros in the data.

CUDNN_OP_TENSOR_NOT
The OP_TENSOR_NOT function is a unary operation that takes the negative of (alpha*A).

cudnnGetDropoutDescriptor
The cudnnGetDropoutDescriptor function allows applications to get dropout values.

Using cuDNN v7.0.1

Ensure you are familiar with the following notes when using this release.
  • Multi-threading behavior has been modified. Multi-threading is allowed only when using different cuDNN handles in different threads.
  • In cudnnConvolutionBackwardFilter, dilated convolution did not support cases where the product of all filter dimensions was odd for half precision floating point. These are now supported by CUDNN_CONVOLUTION_BWD_FILTER_ALGO1.
  • Fixed bug that produced a silent computation error for when a batch size was larger than 65536 for CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM.
  • In getConvolutionForwardAlgorithm, an error was not correctly reported in v5 when the output size was larger than expected. In v6 the CUDNN_STATUS_NOT_SUPPORTED, error message displayed. In v7, this error is modified to CUDNN_STATUS_BAD_PARAM.
  • In cudnnConvolutionBackwardFilter, cuDNN now runs some exceptional cases correctly where it previously erroneously returned CUDNN_STATUS_NOT_SUPPORTED. This impacted the algorithms CUDNN_CONVOLUTION_BWD_FILTER_ALGO0 and CUDNN_CONVOLUTION_BWD_FILTER_ALGO3.

Deprecated Features

The following routines have been removed:
  • cudnnSetConvolution2dDescriptor_v4
  • cudnnSetConvolution2dDescriptor_v5
  • cudnnGetConvolution2dDescriptor_v4
  • cudnnGetConvolution2dDescriptor_v5
Note: Only the non-suffixed versions of these routines remain.
The following routines have been created and have the same API prototype as their non-suffixed equivalent from cuDNN v6:
  • cudnnSetRNNDescriptor_v5 - The non-suffixed version of the routines in cuDNN v7.0.1 are now mapped to their _v6 equivalent.
    Attention: It is strongly advised to use the non-suffixed version as the _v5 and _v6 routines will be removed in the next cuDNN release.
  • cudnnGetConvolutionForwardAlgorithm, cudnnGetConvolutionBackwardDataAlgorithm, and cudnnGetConvolutionBackwardFilterAlgorithm - A _v7 version of this routine has been created. For more information, see the Backward compatibility and deprecation policy chapter of the cuDNN documentation for details.

Known Issues

  • cuDNN pooling backwards fails for pooling window size > 256.