Overview

NVIDIA cuDNN provides highly tuned implementations of operations arising frequently in DNN applications:

Convolution forward and backward, including cross-correlation

Matrix multiplication

Pooling forward and backward

Softmax forward and backward

Neuron activations forward and backward: relu, tanh, sigmoid, elu, gelu, softplus, swish

Arithmetic, mathematical, relational, and logical pointwise operations (including various flavors of forward and backward neuron activations)

Tensor transformation functions

LRN, LCN, batch normalization, instance normalization, and layer normalization forward and backward

Beyond just providing performant implementations of individual operations, the library also supports a flexible set of multi-operation fusion patterns for further optimization. The goal is to achieve the best available performance on NVIDIA GPUs for important deep learning use cases.

In cuDNN version 7 and older, the API was designed to support a fixed set of operations and fusion patterns. We informally call this the “legacy API”. Starting in cuDNN version 8, to address the quickly expanding set of popular fusion patterns, we added a Graph API, which allows the user to express a computation by defining an operation graph. This offers better flexibility versus the legacy API, and for most use cases, is the recommended way to use cuDNN.

Note that while the cuDNN library exposes a C API, we also provide an open source C++ layer which wraps the C API and is considered more convenient for most users. It is, however, limited to just the graph API, and does not support the legacy API.