Backend API Overview

The NVIDIA CUDA Deep Neural Network (cuDNN) library offers a context-based API that allows for easy multithreading and (optional) interoperability with CUDA streams. This API Reference lists the data types and API functions per sub-library.

The cuDNN version 9 library is reorganized into several sub-libraries. This new library structure separates legacy functionality (an imperative API with a fixed set of operations and fusion patterns in cuDNN version 7 and older) from the graph API (a declarative API, introduced in cuDNN version 8), as well as from the engine implementation. The graph API is now grouped into a single library (libcudnn_graph.so).

We recommend users to make API calls through the graph API, but the older shim layer (libcudnn.so) and legacy libraries (libcudnn_cnn.so, libcudnn_ops.so, libcudnn_adv.so) can still be directly used. The engine libraries (libcudnn_engines_precompiled.so, libcudnn_heuristic.so, and libcudnn_engines_runtime_compiled.so) contain necessary internal functionality to support the aforementioned sub-libraries. This library split is done in a way to allow for more flexibility on the user side in loading libraries. The mechanism for more flexible configurability of libraries will be introduced in a later version of cuDNN.

Dynamic Library Dependency Structure of cuDNN on Linux

The API is split into the following libraries, each having corresponding header files in the include directory:

cudnn_graph

  • This entity contains the API functions related to cuDNN context creation and destruction, a list of valid cuDNN backend descriptor types, a list of valid attributes, a subset of valid attribute values, and a full description of each backend descriptor type and their attributes. In addition, it is responsible for loading the correct engine libraries.

  • Engine and heuristics libraries (cudnn_engines_precompiled, cudnn_engines_runtime_compiled, and cudnn_heuristics) are loaded through dlopen by cudnn_graph, and provide implementations for the API in cudnn_graph. The main idea of separating these engines from each other is to, in a future version of cuDNN, provide a way for the user to only load the functionality they need, and thus provide an opportunity to save memory and disk space.

cudnn_ops

  • This entity contains the legacy API functions related to tensor descriptor management, tensor utility API functions, and common machine learning algorithms such as batch normalization, softmax, dropout, and so on. This library depends on cudnn_graph, and indirectly on cudnn_engines_precompiled, as kernels for some APIs reside in the cudnn_engines_precompiled library.

cudnn_cnn

  • This entity contains all legacy API functions related to convolutional neural networks. This library depends on cudnn_ops and cudnn_graph. The cudnn_engines_precompiled library must be present for cudnn_cnn to correctly load.

cudnn_adv

  • This entity contains all other features and algorithms. This includes RNNs, CTC loss, and multihead attention. This library depends on cudnn_ops and cudnn_graph.

Added, Deprecated, and Removed APIs

API Changes for cuDNN 9.2.0

API Changes for cuDNN 9.1.0

  • Added Enums:

    • CUDNN_NUMERICAL_NOTE_STRICT_NAN_PROP

    • CUDNN_POINTWISE_ATAN2

    • CUDNN_POINTWISE_CBRT

    • CUDNN_POINTWISE_CLZ

    • CUDNN_POINTWISE_IS_FINITE

    • CUDNN_POINTWISE_POPC

    • CUDNN_POINTWISE_ROUND_NEAREST_AFZ

    • CUDNN_POINTWISE_ROUND_NEAREST_EVEN

    • CUDNN_POINTWISE_SHIFT_LEFT

    • CUDNN_POINTWISE_SHIFT_RIGHT_ARITHMETIC

    • CUDNN_POINTWISE_SHIFT_RIGHT_LOGICAL

    • CUDNN_POINTWISE_XOR

    • CUDNN_POINTWISE_AND

    • CUDNN_POINTWISE_OR

    • CUDNN_POINTWISE_NOT

API Changes for cuDNN 9.0.0

Removed 9.0.0 APIs

Functions Deprecated in v8, Removed in v9

Replacement Function

  • cudnnCopyAlgorithmDescriptor()

  • cudnnCreateAlgorithmDescriptor()

  • cudnnCreateAlgorithmPerformance()

  • cudnnCreatePersistentRNNPlan()

  • cudnnDestroyPersistentRNNPlan()

  • cudnnSetPersistentRNNPlan()

cudnnBuildRNNDynamic()

  • cudnnDestroyAlgorithmDescriptor()

  • cudnnDestroyAlgorithmPerformance()

  • cudnnGetRNNBiasMode()

  • cudnnGetRNNDescriptor_v6()

  • cudnnGetRNNMatrixMathType()

  • cudnnGetRNNPaddingMode()

  • cudnnGetRNNProjectionLayers()

cudnnGetRNNDescriptor_v8()

  • cudnnFindRNNBackwardDataAlgorithmEx()

  • cudnnFindRNNBackwardWeightsAlgorithmEx()

  • cudnnFindRNNForwardInferenceAlgorithmEx()

  • cudnnFindRNNForwardTrainingAlgorithmEx()

  • cudnnGetAlgorithmDescriptor()

  • cudnnGetAlgorithmPerformance()

  • cudnnGetAlgorithmSpaceSize()

  • cudnnGetRNNBackwardDataAlgorithmMaxCount()

  • cudnnGetRNNBackwardWeightsAlgorithmMaxCount()

  • cudnnGetRNNLinLayerBiasParams()

  • cudnnGetRNNLinLayerMatrixParams()

cudnnGetRNNWeightParams()

cudnnGetRNNParamsSize()

cudnnGetRNNWeightSpaceSize()

  • cudnnGetRNNForwardInferenceAlgorithmMaxCount()

  • cudnnGetRNNForwardTrainingAlgorithmMaxCount()

  • cudnnGetRNNTrainingReserveSize()

  • cudnnGetRNNWorkspaceSize()

cudnnGetRNNTempSpaceSizes()

cudnnRestoreAlgorithm()

  • cudnnRNNBackwardData()

  • cudnnRNNBackwardDataEx()

cudnnRNNBackwardData_v8()

  • cudnnRNNBackwardWeights()

  • cudnnRNNBackwardWeightsEx()

  • cudnnRNNForwardInference()

  • cudnnRNNForwardInferenceEx()

  • cudnnRNNForwardTraining()

  • cudnnRNNForwardTrainingEx()

cudnnRNNForward()

cudnnRNNGetClip()

cudnnRNNGetClip_v9()

cudnnRNNSetClip()

cudnnRNNSetClip_v9()

  • cudnnSaveAlgorithm()

  • cudnnSetAlgorithmDescriptor()

  • cudnnSetAlgorithmPerformance()

  • cudnnSetRNNAlgorithmDescriptor()

  • cudnnSetRNNBiasMode()

  • cudnnSetRNNDescriptor_v6()

  • cudnnSetRNNMatrixMathType()

  • cudnnSetRNNPaddingMode()

  • cudnnSetRNNProjectionLayers()

cudnnSetRNNDescriptor_v8()

  • Removed Types:

    • cudnnRNNPaddingMode_t

    • cudnnAttnQueryMap_t