NVIDIA Deep Learning Performance Documentation - Last updated March 10, 2022 - Send Feedback -

NVIDIA Deep Learning Performance


Get Started With Deep Learning Performance
This is the landing page for our deep learning performance documentation. This page provides recommendations that apply to most deep learning operations. It also provides links, short explanations of other performance documents, and how these pages fit together.

Training


Train With Mixed Precision
Mixed precision methods combine the use of different numerical formats in one computational workload. This document describes the application of mixed precision to deep neural network training.

Recommendation Systems


Best Practices for Building and Deploying Recommender Systems
This document describes the best practices for building and deploying large-scale recommender systems using NVIDIA GPUs. These practices are the culmination of years of research and development in GPU-accelerated tools for recommender systems, as well as building recommender systems for our in-house products and top-performing solutions for international recommendation systems competitions.

Optimizing Performance


Linear/Fully-Connected Layers User's Guide
This guide provides tips for improving the performance of fully-connected (or linear) layers. It also provides an example of the impact of the parameter choice with layers in the Transformer network.
Convolutional Layers User's Guide
This guide provides tips for improving the performance of convolutional layers. It also provides details on the impact of parameters including batch size, input and filter dimensions, stride, and dilation.
Recurrent Layers User's Guide
This guide provides tips for improving the performance of recurrent layers. It also provides an example of use cases for persistence with layers in the GNMT system.
Memory-Limited Layers User's Guide
This guide describes the performance of memory-limited layers including batch normalization, activations, and pooling. It also provides tips for understanding and reducing the time spent on these layers within a network.

Performance Background


GPU Performance Background User's Guide
This guide provides background on the structure of a GPU, how operations are executed, and common limitations with deep learning operations.
Matrix Multiplication Background User's Guide
This guide describes matrix multiplications and their use in many deep learning operations. The trends described here form the basis of performance trends in fully-connected, convolutional, and recurrent layers, among others.