NVIDIA® Nsight™ Development Platform, Visual Studio Edition 4.7 User Guide
Send Feedback
In the following walkthrough we present some of the more common procedures that you might use to debug a CUDA-based application. We use a sample application called Matrix Multiply as an example. NVIDIA Nsight includes this sample application.
For the purpose of this walkthrough, we are going to assume that the application is debugged remotely (the NVIDIA Nsight host software running on a machine with Visual Studio, and the Nsight Monitor running on a separate machine).
matrixMul_kernel.cu
.matrixMul_kernel.cu
at the statement:int aStep = BLOCK_SIZE
matrixMul_kernel.cu
at the statement that begins:for {int a = aBegin, b = bBegin;
In this section of the walkthrough, you opened the sample project and set breakpoints. Next we build the sample project and start the debugging session.
localhost
with the name of your target machine (the remote computer where the application to be debugged will run). This can be the IP address of the machine on your local network, or the machine name as recognized on your network.WRONG:M:\
CORRECT:jsmith.mydomain.com
![]() | NOTE on the CUDA Data Stack feature: On Fermi and later architectures, each GPU thread has a private data stack. Normally the required data stack size is determined by the compiler, and usually the driver's default size is greater than what a kernel will require. However, if a kernel uses a recursive function, the compiler cannot statically determine the data stack size. In such cases the application must call cuCtxGetLimit() and cuCtxSetLimit() with CU_LIMIT_STACK_SIZE to ensure adequate stack space.Setting CU_LIMIT_STACK_SIZE is normally the responsibility of the application, for release-compiled kernels. Since debug-compiled kernels require extra stack space, the application would require different stack size settings for debug and release. As a convenience, and to avoid polluting application code with debug-kernel-specific code, we have added settings to the CUDA Debugger that will automatically increase your stack size settings while debugging. |
![]() | The CUDA Toolkit that you use to compile your CUDA C code must support the following switch for generating debug symbolics: -G0 |
On the host machine, notice that a pop-up message indicates that a connection has been made. You've started the debugging session. In the next section of this walkthrough, we'll look at some of the windows that you typically inspect during a debugging session.
In Visual Studio 2010, you may have a dependency fail because the properties of the .cu file are configured incorrectly. To workaround this issue, use the following steps.
NOTE: You cannot change the value in GPU memory by editing the value in the Locals window.
__local__
, __const__
or __shared__
make sure the Visual Studio Memory view is set to Re-evaluate automatically. This will ensure that the memory shown is for the correct memory space. Without this, the display can change to an address which defaults to global memory.
![]() |
You cannot change the value in GPU memory by editing the value in the Memory window. |
How Tos
How To: Launch the CUDA Debugger
How To: Set GPU Breakpoints
How To: Setup Local Headless GPU Debugging
How To: Specify Debugger Context
Reference
Tesla Compute Cluster (TCC)
NVIDIA GameWorks Documentation Rev. 1.0.150630 ©2015. NVIDIA Corporation. All Rights Reserved.