NVIDIA cuDNN#

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. It provides highly tuned implementations of operations arising frequently in deep neural network (DNN) applications:

Scaled dot-product attention
Convolution, including cross-correlation
Matrix multiplication
Normalizations, softmax, and pooling
Arithmetic, mathematical, relational, and logical pointwise operations

Beyond just providing high-performance implementations of individual operations, cuDNN also supports a flexible set of multi-operation fusion patterns for further optimization. The goal is to achieve the best available performance on NVIDIA GPUs for important deep learning use cases.

In cuDNN, both single-operation and multi-operation computations are expressed as operation graphs. The following API layers are available for constructing these graphs:

Python frontend API
C++ frontend API
C backend API

The NVIDIA cuDNN frontend API provides a simplified programming model that is sufficient for most use cases.

Use the NVIDIA cuDNN backend API only if you want to use the legacy fixed-function routines that are not graph-based interfaces and are not exposed by the frontend API layers, or if you need a C-only interface.

Block diagram showing the relationships between the cuDNN frontend and backend API layers and the intended audience for each layer