NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition 5.2 User Guide
Send Feedback
In the following walkthrough we present some of the more common procedures that you might use to debug a CUDA-based application. We use a sample application called Matrix Multiply as an example. NVIDIA Nsight includes this sample application.
For the purpose of this walkthrough, we are going to assume that the application is debugged remotely (the NVIDIA Nsight host software running on a machine with Visual Studio, and the Nsight Monitor running on a separate machine).
matrixMul_kernel.cu
.matrixMul_kernel.cu
at the statement:
int aStep = BLOCK_SIZE
Visual Studio marks the location of the breakpoint with a red circle. You can also use any of the other various methods that Visual Studio provides to set breakpoints.
matrixMul_kernel.cu
at the statement that begins:
for {int a = aBegin, b = bBegin;
In this section of the walkthrough, you opened the sample project and set breakpoints. Next we build the sample project and start the debugging session.
The Nsight Monitor starts. The Nsight Monitor icon appears in the system tray.
The User Settings window appears.
localhost
with the name of your target machine (the remote computer where the application to be debugged will run). This can be the IP address of the machine on your local network, or the machine name as recognized on your network.
IMPORTANT: Do not use a mapped drive to specify the hostname. For example:
WRONG:M:\
CORRECT:jsmith.mydomain.com
NOTE on the CUDA Data Stack feature: On newer architectures, each GPU thread has a private data stack. Normally the required data stack size is determined by the compiler, and usually the driver's default size is greater than what a kernel will require. However, if a kernel uses a recursive function, the compiler cannot statically determine the data stack size. In such cases the application must call cuCtxGetLimit() and cuCtxSetLimit() with CU_LIMIT_STACK_SIZE to ensure adequate stack space.Setting CU_LIMIT_STACK_SIZE is normally the responsibility of the application, for release-compiled kernels. Since debug-compiled kernels require extra stack space, the application would require different stack size settings for debug and release. As a convenience, and to avoid polluting application code with debug-kernel-specific code, we have added settings to the CUDA Debugger that will automatically increase your stack size settings while debugging. |
The CUDA Toolkit that you use to compile your CUDA C code must support the following switch for generating debug symbolics: -G0 |
As an alternate option, you can also choose to right-click on the project and select Start CUDA Debugging.
On the host machine, notice that a pop-up message indicates that a connection has been made. You've started the debugging session. In the next section of this walkthrough, we'll look at some of the windows that you typically inspect during a debugging session.
In Visual Studio 2010, you may have a dependency fail because the properties of the .cu file are configured incorrectly. To workaround this issue, use the following steps.
The Locals window opens. The Locals window displays the variables and their values in the current lexical scope.
NOTE: You cannot change the value in GPU memory by editing the value in the Locals window.
The Memory window opens.
The memory window displays the values at the address that corresponds to the variable (or pointer).
__local__
, __const__
or __shared__
make sure the Visual Studio Memory view is set to Re-evaluate automatically. This will ensure that the memory shown is for the correct memory space. Without this, the display can change to an address which defaults to global memory. You cannot change the value in GPU memory by editing the value in the Memory window. |
How Tos
Reference
NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition User Guide Rev. 5.2.161206 ©2009-2016. NVIDIA Corporation. All Rights Reserved.