cuDNN Release Notes v7.1.1

Key Features and Enhancements

The following enhancements have been added to this release:

  • Added new API cudnnSetRNNProjectionLayers and cudnnGetRNNProjectionLayers to support Projection Layer for the RNN LSTM cell. In this release only the inference use case will be supported. The bi-directional and the training forward and backward for training is not supported in 7.1.1 but will be supported in the upcoming 7.1.2 release without API changes. For all the unsupported cases in this release, CUDNN_NOT_SUPPORTED is returned when projection layer is set and the RNN is called.
  • The cudnnGetRNNLinLayerMatrixParams() function was enhanced and a bug was fixed without modifying its prototype. Specifically:
    • The cudnnGetRNNLinLayerMatrixParams() function was updated to support the RNN projection feature. An extra linLayerID value of 8 can be used to retrieve the address and the size of the “recurrent” projection weight matrix when "mode" in cudnnSetRNNDescriptor() is configured to CUDNN_LSTM and the recurrent projection is enabled via cudnnSetRNNProjectionLayers().
    • Instead of reporting the total number of elements in each weight matrix in the “linLayerMatDesc” filter descriptor, the cudnnGetRNNLinLayerMatrixParams() function returns the matrix size as two dimensions: rows and columns. This allows the user to easily print and initialize RNN weight matrices. Elements in each weight matrix are arranged in the row-major order. Due to historical reasons, the minimum number of dimensions in the filter descriptor is three. In previous versions of the cuDNN library, cudnnGetRNNLinLayerMatrixParams() returned the total number of weights as follows: filterDimA[0]=total_size, filterDimA[1]=1, filterDimA[2]=1. In v7.1.1, the format was changed to: filterDimA[0]=1, filterDimA[1]=rows, filterDimA[2]=columns. In both cases, the "format" field of the filter descriptor should be ignored when retrieved by cudnnGetFilterNdDescriptor().
    • A bug in cudnnGetRNNLinLayerMatrixParams() was fixed to return a zeroed filter descriptor when the corresponding weight matrix does not exist. This occurs, for example, for linLayerID values of 0-3 when the first RNN layer is configured to exclude matrix multiplications applied to RNN input data (inputMode=CUDNN_SKIP_INPUT in cudnnSetRNNDescriptor() specifies implicit, fixed identity weight matrices for RNN input). Such cases in previous versions of the cuDNN library caused cudnnGetRNNLinLayerMatrixParams() to return corrupted filter descriptors with some entries from the previous call. A workaround was to create a new filter descriptor for every invocation of cudnnGetRNNLinLayerMatrixParams().
  • The cudnnGetRNNLinLayerBiasParams() function was updated to report the bias column vectors in "linLayerBiasDesc" in the same format as cudnnGetRNNLinLayerMatrixParams(). In previous versions of the cuDNN library, cudnnGetRNNLinLayerBiasParams() returned the total number of adjustable bias parameters as follows: filterDimA[0]=total_size, filterDimA[1]=1, filterDimA[2]=1. In v7.1.1, the format was changed to: filterDimA[0]=1, filterDimA[1]=rows, filterDimA[2]=1 (number of columns). In both cases, the "format" field of the filter descriptor should be ignored when retrieved by cudnnGetFilterNdDescriptor(). The recurrent projection GEMM does not have a bias so the range of valid inputs for the "linLayerID" argument remains the same.
  • Added support for use of Tensor Core for the CUDNN_RNN_ALGO_PERSIST_STATIC. This required cuda cuDNN v7.1 build with CUDA 9.1 and 387 or higher driver. It will not work with CUDA 9.0 and 384 driver.
  • Added RNN search API that allows the application to provide an RNN descriptor and get a list of possible algorithm choices with performance and memory usage, to allow applications to choose between different implementations. For more information, refer to the documentation of: cudnnFindRNNForwardInferenceAlgorithmEx, cudnnFindRNNForwardTrainingAlgorithmEx, cudnnFindRNNBackwardDataAlgorithmEx, and cudnnFindRNNBackwardWeightsAlgorithmEx. In this release, the search will operate on STANDARD algorithm and will not support PERSISTENT algorithms of RNN.
  • Added uint8 for support for the input data for cudnnConvolutionBiasActivationForward and cudnnConvolutionForward. Currently the support is on Volta (sm 70 ) and later architectures. Support for older architectures will be gradually added in the upcoming releases.
  • Suport for CUDNN_ACTIVATION_IDENTITY is added to cudnnConvolutionBiasActivationForward. This allows users to perform Convolution and Bias without Activation.
  • All API functions now support logging. User can trigger logging by setting environment variable “CUDNN_LOGINFO_DBG=1” and “CUDNN_LOGDEST_DBG= <option>” where <option> (i.e., the output destination of the log) can be chosen from “stdout”, “stderr”, or a file path. User may also use the new Set/GetCallBack functions to install their customized callback function. Log files can be added to the reported bugs or shared with us for analysis and future optimizations through partners.nvidia.com.
  • Improved performance of 3D convolution on Volta architecture.
  • The following algo-related functions have been added for this release: cudnnGetAlgorithmSpaceSize, cudnnSaveAlgorithm, cudnnRestoreAlgorithm, cudnnCreateAlgorithmDescriptor, cudnnSetAlgorithmDescriptor, cudnnGetAlgorithmDescriptor, cudnnDestroyAlgorithmDescriptor, cudnnCreateAlgorithmPerformance, cudnnSetAlgorithmPerformance, cudnnGetAlgorithmPerformance, cudnnDestroyAlgorithmPerformance.
  • All algorithms for convolutions now support groupCount > 1. This includes cudnConvolutionForward(), cudnnConvolutionBackwardData(), and cudnnConvolutionBackwardFilter().

Known Issues

Following are known issues in this release:

  • RNN search Algorithm is restricted to STANDARD algorithm.
  • Newly added projection Layer supported for inference and one directional RNN cells.
  • uint8 input for convolution is restricted to Volta and later.
  • cudnnGet picks a slow algorithm that doesn't use Tensor Cores on Volta when inputs are FP16 and it is possible to do so.
  • There may be a small performance regression on multi-layer RNNs using the STANDARD algorithm with Tensor Core math in this release compared to 7.0.5.

Fixed Issues

The following issues have been fixed in this release:

  • 3D convolution performance improvements for Volta.
  • Added support for Algorithm 0 data gradients to cover cases previously not supported.
  • Removed the requirement for dropout Descriptor in RNN inference. Before application had to set a non point for the dropout Descriptor which was not used.
  • Use of CUDNN_TENSOR_NCHW_VECT_C with non-zero padding resulted in a return status of CUDNN_STATUS_INTERNAL_ERROR. This issue is now fixed.