Deep Learning Performance

NVIDIA Deep Learning Performance

Deep learning differs from traditional machine learning techniques in that they can automatically learn representations from data such as images, video or text, without introducing hand-coded rules or human domain knowledge. Their highly flexible architectures can learn directly from raw data and can increase their predictive accuracy when provided with more data.

Deep learning is commonly used across apps in computer vision, conversational AI and recommendation systems. Computer vision apps use deep learning to gain knowledge from digital images and videos. Conversational AI apps help computers understand and communicate through natural language. Recommendation systems use images, language, and a user’s interests to offer meaningful and relevant search results and services.

Deep learning has led to many recent breakthroughs in AI such as Google DeepMind’s AlphaGo, self-driving cars, intelligent voice assistants and many more. With NVIDIA GPU-accelerated deep learning frameworks, researchers and data scientists can significantly speed up deep learning training, that could otherwise take days and weeks to just hours and days. When models are ready for deployment, developers can rely on GPU-accelerated inference platforms for the cloud, embedded device or self-driving cars, to deliver high-performance, low-latency inference for the most computationally-intensive deep neural networks.

Documentation Center
These documents provide information regarding NVIDIA deep learning performance.
Mixed precision methods combine the use of different numerical formats in one computational workload. This document describes the application of mixed precision to deep neural network training.
Recommendation Systems
This document describes the best practices for building and deploying large-scale recommender systems using NVIDIA GPUs. These practices are the culmination of years of research and development in GPU-accelerated tools for recommender systems, as well as building recommender systems for our in-house products and top-performing solutions for international recommendation systems competitions.
Optimizing Performance
This guide provides tips for improving the performance of fully-connected (or linear) layers. It also provides an example of the impact of the parameter choice with layers in the Transformer network.
This guide provides tips for improving the performance of convolutional layers. It also provides details on the impact of parameters including batch size, input and filter dimensions, stride, and dilation.
This guide provides tips for improving the performance of recurrent layers. It also provides an example of usage cases for persistence with layers in the GNMT system.
This guide describes the performance of memory-limited layers including batch normalization, activations, and pooling. It also provides tips for understanding and reducing the time spent on these layers within a network.
Performance Background
This guide provides background on the structure of a GPU, how operations are executed, and common limitations with deep learning operations.
This guide describes matrix multiplications and their use in many deep learning operations. The trends described here form the basis of performance trends in fully-connected, convolutional, and recurrent layers, among others.