NVIDIA Deep Learning Performance

Get Started With Deep Learning Performance: This is the landing page for our deep learning performance documentation. This page provides recommendations that apply to most deep learning operations. It also provides links, short explanations of other performance documents, and how these pages fit together.

Training

Train With Mixed Precision: Mixed precision methods combine the use of different numerical formats in one computational workload. This document describes the application of mixed precision to deep neural network training.

Best Practices for Building and Deploying Recommender Systems: This document describes the best practices for building and deploying large-scale recommender systems using NVIDIA GPUs. These practices are the culmination of years of research and development in GPU-accelerated tools for recommender systems, as well as building recommender systems for our in-house products and top-performing solutions for international recommendation systems competitions.

Linear/Fully-Connected Layers User's Guide: This guide provides tips for improving the performance of fully-connected (or linear) layers. It also provides an example of the impact of the parameter choice with layers in the Transformer network.
Convolutional Layers User's Guide: This guide provides tips for improving the performance of convolutional layers. It also provides details on the impact of parameters including batch size, input and filter dimensions, stride, and dilation.
Recurrent Layers User's Guide: This guide provides tips for improving the performance of recurrent layers. It also provides an example of use cases for persistence with layers in the GNMT system.
Memory-Limited Layers User's Guide: This guide describes the performance of memory-limited layers including batch normalization, activations, and pooling. It also provides tips for understanding and reducing the time spent on these layers within a network.

GPU Performance Background User's Guide: This guide provides background on the structure of a GPU, how operations are executed, and common limitations with deep learning operations.
Matrix Multiplication Background User's Guide: This guide describes matrix multiplications and their use in many deep learning operations. The trends described here form the basis of performance trends in fully-connected, convolutional, and recurrent layers, among others.

AEC Media & Entertainment Restaurant / Quick-Service Developer / Engineer Energy HPC / Scientific Computing Technical Guide Financial Services Telecommunications Gaming Documentation Center Developer / Engineer Aerospace Hardware / Semiconductor Manufacturing Automotive / Transportation Public Sector Consumer Internet Cloud Services Healthcare & Life Sciences Agriculture Academia / Higher Education Retail / Consumer Packaged Goods Deep Learning Performance

Last updated on Jul 27, 2023.