Reference > Troubleshooting NVIDIA Nsight

Troubleshooting NVIDIA Nsight

NVIDIA® Nsight™ Development Platform, Visual Studio Edition 3.2 User Guide
Send Feedback

Problem:

I get the following error when I perform analysis activity:

Nsight-NvEvents-Provider: Too few event  buffers

The event system capturing the analysis data allocates a set of output buffers to communicate the captured date. Each OS thread that emits events to the analysis system requires to reserve such an event buffer to be able to output any data. In case all these buffers are already in use, additional providers of events will trigger this error message and their event output will be discarded.

Resolution:

You can configure the number of event buffers on the Activity document as part of the NvEvents controller options. To show these options, make sure that the flag Show Controller Options is set to TRUE. Set the option from the Nsight menu: Nsight > Options > Analysis.

For optimal performance the number of event buffers should at least be twice the number of threads outputting events.

Problem:

When I convert a project from Visual Studio 2008 to Visual Studio 2010, I get build errors.

Resolution:

For more information on how to convert a project, see the NVIDIA Developer Forums.

Problem:

How do I get a diagnostic log(s) of the NVIDIA Nsight host and monitor for troubleshooting purposes?

Resolution:

Close both Visual Studio and the Nsight Monitor.
On both the host and target machines, go to %AppData%\NVIDIA Corporation\Nsight\Vsip\1.0\Logs and %AppData%\NVIDIA Corporation\Nsight\Monitor\1.0\Logs and delete any existing files.
Edit Nvda.Diagnostics.nlogas follows.

On the host machine:

For 32-bit OS: Program Files\NVIDIA Corporation\Nsight Visual Studio Edition 3.2\Common\Configurations
For 64-bit OS: Program Files (x86)\NVIDIA Corporation\Nsight Visual Studio Edition 3.2\Common\Configurations

On the target machine:

For 32-bit OS: Program Files\NVIDIA Corporation\Nsight Visual Studio Edition Monitor 3.2\Common\Configurations
For 64-bit OS: Program Files (x86)\NVIDIA Corporation\Nsight Visual Studio Edition Monitor 3.2\Common\Configurations

Go to the last logger at the bottom of the file: <logger name=”*” minlevel=”Error” writeTo=”file-high-severity” />.
Change the minlevel attribute value from "Error" to "Trace".
Save the file.
Reproduce the problem, and send the following generated logs:

%AppData%\NVIDIA Corporation\Nsight\Vsip\1.0\Logs
%AppData%\NVIDIA Corporation\Nsight\Monitor\1.0\Logs

Problem:

When breakpoints are set in source code, the CUDA Debugger pauses execution at locations unrelated to the breakpoints.

This can happen when more than one __global__ function (kernel) makes a call to a __device__ function within a single module, and both of the following are true:

the __device__ function is not inlined.

the different kernels call the exact same __device__ function.

Resolution:

There are a couple of approaches you can take to work around this issue:

Force the __device__ function to be inlined by applying the __forceinline__ keyword to the __device__ function. Note that using the inline keyword does not force inlining in debug builds.
Reorganize source code so that there is only one __global__ function for each instance of the __device__ function. This means that each .cu file that is compiled with the NVIDIA CUDA Compiler (nvcc.exe) should contain no more than one __global__ function. This works for both Driver API and CUDART applications. Be aware that there are other potential issues with this approach:
- Recommended: move commonly used __device__ functions to common header files. Use the #include statement to include the __device__ function in each .cu file containing a __global__ function.
- Potential issue: If your source code contains declarations of a global variable in the following style:
  
  __device__ int x;
  
  and that variable is used by multiple __global__ functions, then using multiple files to make multiple calls to the __global__ function is not a trivial work-around. In this case, we recommend eliminating global variables that are declared in that style from the source code, and making them kernel parameters instead.
- Potential issue: Each __constant__ variable is associated with one CUDA module (a compiled .cu file).
  
  If your source code is written in a way that multiple kernels depend on the same __constant__ variable, and the host code side of your application dynamically updates that variable, then you will need some broader changes to your source code:
  - For a CUDART application, when copying the __constant__ variable into each .cu file, give each variable a different name.
  - Any host code that was updating the previously single instance of the variable must now update all the instances.

Problem:

I get warnings that 64-bit and/or 32-bit injection is not present.

Resolution:

The Nsight Monitor checks for 64-bit versions of the CUDA injection. This means that you can get warnings if 64-bit and/or 32-bit injection is not present. If this happens, re-install the tools.

Problem:

My machine hangs when I use the CUDA Debugger locally on a single machine with 2 GPUs on it.

Resolution:

There are several possible issues that can cause a machine to hang when locally debugging on two GPUs with the NVIDIA Nsight tools.

Make sure that your TDR settings have been configured correctly. For more information, see Timeout Detection and Recovery.

We recommend not having a display attached or a desktop running on the GPU on which you are debugging CUDA code, as having concurrent activities on a GPU can cause machine hangs. See How To: Setup Local Headless GPU Debugging for more information.

Problem:

The GPU debugger hangs when I also use the CPU debugger.

Resolution:

Never use the same Visual Studio instance to run both the CUDA Debugger and the CPU debugger.

In general, make sure you only use either CUDA Debugger or CPU debugger, not both. Attaching the CPU debugger and hitting a CPU breakpoint during a CUDA debugging session will cause the CUDA Debugger to hang (until you resume the CPU process).

If you are careful, you can attach two separate Visual Studio instances (one CUDA, one CPU). While you are stopped in CPU code, the CUDA Debugger will hang. Once you resume the CPU code, CUDA Debugger will come back alive.

Problem:

I am unable to set and hit a breakpoint in my CUDA code.

Resolution:

Make sure to use the driver version specified in the release notes. This is the most common reason that breakpoints do not work. The driver must be installed on the machine where your application code runs.

Also make sure your project uses a compatible CUDA toolkit (version 4.2, 4.1, or 4.0). NVIDIA Nsight includes these versions. A compatible version of the CUDA toolkit generates symbolics information that allows the CUDA Debugger to properly debug your code when you use the -G0 flag on the nvcc command line. If you are using the CUDA Driver API, make sure that there are .cubin.elf.o files alongside each of your compiled .cubin files in the build output directory for your project. Projects using the CUDA Runtime API have the symbolics information embedded in the object file itself.

Problem:

I get the following error message:

Local debugging failed. Nsight is incompatible with  WPF acceleration.
Please see documentation about WPF acceleration. Run  the
DisableWpfHardwareAcceleration.reg in your Nsight  installation.

Resolution:

Disable WPF D3D acceleration. For more information, see Setup Local Debugging.

If one or more applications are running with WPF hardware acceleration and you run the .reg file, you could still have issues until those applications are restarted. If you are performing local debugging, this includes the Nsight Monitor - you need to restart it seeing as it too is a WPF application.

Problem:

My program ignores breakpoints set in CPU code when I debug a program by choosing Start CUDA Debugging from the Nsight menu.

Resolution:

The CUDA Debugger ignores breakpoints set in CPU code as it does not currently support debugging x86 or other CPU code.

Problem:

When I hit a CUDA breakpoint, I only break once on thread (0, 0, 0) in my CUDA kernel. If I hit Continue (F5), it never breaks again and the entire launch completes.

Resolution:

The default behavior of the CUDA Debugger is to break unconditionally on the first thread of a kernel. After that, the breakpoints have an implicit conditional based on the CUDA Focus Picker. If you would like to break on a different thread, use the CUDA Focus Picker to switch focus to the desired thread or set a conditional breakpoint so that the debugger stops only on the thread you specify. For more information on setting the conditional breakpoint, see How To: Specify Debugger Context and How To: Set GPU Breakpoints. After you switch focus, the CUDA Debugger maintains the focus and breaks on breakpoints only in that thread for the duration of the kernel launch.

Open topic with navigation