NVIDIA® Nsight™ Development Platform, Visual Studio Edition 4.7 User Guide
Send Feedback
In the following tutorial we look at how to use some of the basic features of the CUDA Debugger. For the purpose of this tutorial, we use a sample application called Matrix Multiply, but you can follow the same procedures, using your own source.
This tutorial covers how to debug an application locally. This means that you will need to have the NVIDIA Nsight host software running on a machine with Visual Studio, and have the Nsight Monitor also running on the same machine.
Make sure that the machine you use meets the system requirements. For more information, see System Requirements for Nsight Software.
That will be our first exercise in this tutorial: configuring a machine for local debugging. In this tutorial:
- EXERCISE 1: Open A Project And Build The Executable
- EXERCISE 2: Set Breakpoints
- EXERCISE 3: Run The CUDA Debugger And Inspect Variables
- EXERCISE 4: Run The Memory Checker
Let's open the sample project matrixMul. This is a simple CUDA-based application that multiplies 2 matrices. The algorithms in the source code are relatively simple, but will still give you a sense of how the CUDA Debugger works. The matrixMul application is included with the NVIDIA Nsight software.
Make sure that you understand the importance of using a CUDA Toolkit that works with NVIDIA Nsight.
![]() |
CUDA Toolkit: In order to use a project with the NVIDIA Nsight tools, we recommend that you use the compiler that ships with the tools. The default installation directory for this version of the compiler is:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA The compiler is in a subdirectory labeled by its version, such as: ...\<version_number>\bin\nvcc.exe The NVIDIA Nsight tools work best with this version of the compiler. However, the tools also work with the standard toolkit. Whichever compiler you use, the CUDA Toolkit that you use to compile your CUDA C code must support the following switch for generating debug symbolics: -G0 |
matrixMul_vc100.vcpxroj
.matrixMul
project file. You have now successfully opened the project and built the matrixMul executable.
Before we run the matrixMul application, let's set some breakpoints at key places in the source code. This will cause the CUDA Debugger to pause execution of the target application at those points, and give us an opportunity to inspect the values of variables and the state of each thread.
matrixMul_kernel.cu
.matrixMul_kernel.cu
at the statement:int aBegin = wA * BLOCK_SIZE * by;
int aStep = BLOCK_SIZE;
Let's set another breakpoint at:
int BS(ty, tx) = B[b + wB * ty + tx];
_synchthreads
statement.Let's start the CUDA Debugger and take a look at variables and memory at the breakpoints we set.
The CUDA Debugger resumes execution of the matrixMul application, and pauses before executing the instruction on the line of source code at the next breakpoint. Before we continue execution, let's take a look at the values in memory.
When viewing memory in __local__
, __const__
or __shared__
make sure the Visual Studio Memory view is set to Re-evaluate automatically. This will ensure that the memory shown is for the correct memory space. Without this, the display can change to an address which defaults to global memory.
![]() |
You cannot change the value in GPU memory by editing the value in the Memory window. |
The CUDA Memory Checker keeps track of all memory allocations. to ensure that invalid memory locations are not accessed by the target application.
Writing to an out-of-bounds memory location in a CUDA kernel launch causes the GPU to terminate the launch, and places the CUDA context in a permanent error state. This results in all CUDA API functions returning an error code, such as CUDA_ERROR_UNKNOWN. The coding errors that lead to invalid memory access can been difficult to debug without a memory checker.
NVIDIA GameWorks Documentation Rev. 1.0.150630 ©2015. NVIDIA Corporation. All Rights Reserved.