NVIDIA Deep Learning Performance Documentation - Last updated April 1, 2020 - Send Feedback -

NVIDIA Deep Learning Performance


Getting Started With Deep Learning Performance
This is the landing page for our deep learning performance documentation. This page gives a few broad recommendations that apply for most deep learning operations and links to the other guides in the documentation with a short explanation of their content and how these pages fit together.

Training


Training With Mixed Precision
This document introduces the concept of mixed precision and automatic mixed precision, how to optimize with Tensor Cores, and provides a look into how each framework applies the application of mixed precision to deep neural network training.

Optimizing Performance


Fully-Connected Layers User Guide
This guide provides tips for improving the performance of fully-connected (or linear) layers and an example of the impact of parameter choice with layers in the Transformer network.
Convolutional Layers User Guide
This guide provides tips for improving the performance of convolutional layers, with details on the impact of parameters including batch size, input and filter dimensions, stride, and dilation.
Recurrent Layers User Guide
This guide provides tips for improving the performance of recurrent layers and an example of usage cases for persistence with layers in the GNMT system.
Memory-Limited Layers User Guide
This guide describes the performance of memory-limited layers including batch normalization, activations, and pooling, and provides tips for understanding and reducing the time spent on these layers within a network.

Performance Background


GPU Performance Background User Guide
This guide provides background on the structure of a GPU, how operations are executed, and common limitations with deep learning operations.
Matrix Multiplication Background User Guide
This guide provides background on matrix multiplications and their use in many deep learning operations; the trends described here form the basis of performance trends in fully-connected, convolutional, and recurrent layers, among others.