You are here: Analysis Tools > CUDA Experiments > Kernel-Level Experiments > Memory Statistics - Buffers

Overview

Buffers are memory on the CUDA device or in system memory. Bandwidth to device memory is higher than bandwidth to system memory, but the caches and shared memory provide much lower latency and higher bandwidth. Thus, the most efficient way to access device and system memory is to avoid re-accessing the same buffer data multiple times if it can be stored in shared memory or cache. Since the initial accesses to device and system memory must be done, they are often a performance bottleneck, so it is important to monitor bandwidth to buffers.

Chart

Buffers

All accesses to buffers from the GPU are routed through the L2 write-back cache. System memory can be accessed directly from a CUDA kernel using zero-copy, where the CUDA API is used to allocate system memory mapped into the CUDA device's address space. The Buffers chart shows the amount of data loaded and stored separately, and the sum of loads and stores in the System Memory and Device Memory regions, along with a percentage of overall bandwidth utilization.

Analysis


NVIDIA® Nsight™ Development Platform, Visual Studio Edition User Guide Rev. 4.6.150311 ©2009-2015. NVIDIA Corporation. All Rights Reserved.

of