You are here: Release Notes

Release Notes

NVIDIA® Nsight™ Development Platform, Visual Studio Edition 4.0 User Guide
Send Feedback

Important information about the NVIDIA® Nsight™ Visual Studio Edition 4.0 release

Display Driver

You must install the NVIDIA display driver that supports the NVIDIA Nsight tools. If you have an NVIDIA graphics card installed on your target machine, you likely already have an NVIDIA display driver; however, NVIDIA Nsight requires a specific version of the driver in order to function properly. From the NVIDIA web site, download and install the following display driver (or newer):

Driver Release 332, Release 332.44 or newer

See below for more release information about:

CUDA Debugger

Graphics Debugger

Analysis Tools

CUDA Debugger

New in the 4.0 Release

Microsoft Visual Studio 2013 is now supported. (23594)
Note: In order to build CUDA applications, your project settings will need to be set up for the Visual Studio 2012 or prior toolset. However, you can still host the project in Visual Studio 2013 to use the CUDA Debugger and analysis tools.
Support for the CUDA 6.0 toolkit. (23610)
This includes new features such as being able to debug and analyze applications that use Unified Virtual Memory (UVM). (23718)
Support for the new Maxwell architecture (for example, found in the GeForce GTX 750 Ti and 750).
Added support for new GPUs, such as the GeForce GT 720M, 730M, 735M, and 740M. (18448)

Changed Features and Fixed Issues in the 4.0 Release of the CUDA Debugger

Support for the Tesla architecture GPUs with SM1 (for example, GeForce GTX 260, Tesla® C1060) has now been deprecated. (21574)
Visual Studio macros are now supported, so values such as $(TargetPath), $(ProjectDir), and $(LocalDebuggerCommandArguments) are able to be added on the NVIDIA Nsight properties page. (12107)

Known Issues in the 4.0 Release of the CUDA Debugger

Microsoft Visual Studio 2008 is being deprecated and will not be supported starting with the next release of NVIDIA® Nsight™ Visual Studio Edition.
Firewall and anti-intrusion software (e.g., McAfee Host Intrusion Prevention) will not allow remote debugger connections. Please disable or add an exclusion for Nsight Monitor. (22804)
When the address of a shared memory location is passed to a __global__ subroutine on a Maxwell GPU, a bug in the shared memory range checker will incorrectly flag those memory accesses inside the subroutine as out-of-range. (30497)
Fermi devices with an attached display can hang when stopping debugging while at a breakpoint. This occurs in hardware mode debugging. Please use preemption mode debugging, or switch to a GPU without an attached display. (18778)
In some cases, when the CUDA application is built with the "Generate Relocatable Device Code" option, and a CUDA kernel function is declared with the __global__ static attributes, the NVIDIA Nsight debugger might not be able to display local variables inside that function. Users can work around this issue by simply removing the static qualifier on the function. (21914)
On Tesla architecture GPUs, warps frozen at an exception or inline breakpoint may re-report the event when the debugger suspends for other reasons. (This has been fixed on Fermi and Kepler GPUs.) (16327)
You must enable Memory Checker before launching a process, and cannot change the setting while debugging. (18935, 18937)
When the CUDA Debugger is used to debug CUDA applications which share resources with DirectX 9 (such as the "simpleD3D9" sample program), the debugger may display incorrect values for memory locations in those shared resources. This may happen when the GPU device executing the application is Compute Capability 2.0 or higher. Incorrect values for the contents of memory may be displayed in any debug window (Autos, Locals, Watch, Warp Watch, or Memory). This issue does not affect applications using Direct3D 11. (13899)
On Fermi GPUs, after an MMU fault, the CUDA Debugger will not resume normally. The application may still be terminated. (13808)
When using the CUDA Debugger with NVIDIA Nsight, breakpoints will not be hit in source files whose full paths contain non-ASCII characters. Any path with a character code >= 128 is affected. (11429)
If you experience hangs or TDRs while locally debugging CUDA on a single GPU (or using the Software Preemption debugging mode in general), try disabling operating system features that use video hardware acceleration. For example, disabling Aero on Windows 7, changing to a high-contrast desktop theme on Windows 8, or disabling WPF acceleration.
Variables do not appear for source code that is not executed. This occurs because the compiler aggressively optimizes code even if you have not specified any compiler optimizations. As a result, the compiler removes any code that will not be executed from the output executable.
Breakpoints will hit multiple times on lines that have more than one inline function call. For example, setting a breakpoint on:
x = cos() + sin()
will generate three breakpoints on that line. One for the evaluation of the expression, plus one for each function on the line.
Unloading modules does not refresh the state of breakpoints set in that module. This means that those breakpoints do not show their latest state in Visual Studio when they have been unloaded.
The Visual Studio Breakpoint "Filter" option is not supported for CUDA GPU breakpoints.
The Visual Studio Breakpoint "Hitcount" option is not supported for CUDA GPU breakpoints.
When starting Graphics or CUDA Debugging in Visual Studio, the user cannot specify environment variables to be used in the environment block of the launched process.
The F5 hotkey (which is the default hotkey in Visual Studio for starting the CPU Debugger) does not start the CUDA Debugger.
To start the CUDA Debugger, you must either change the key bindings or use the menu command:
Nsight > Start CUDA Debugging.
There is no support for automatically performing a Build when launching the CUDA Debugger.
The Load Symbols option, or "Symbols settings," in the Modules view is not supported for CUDA debugging.

Graphics Inspector and Graphics Debugger

New in the 4.0 Release of the Graphics Debugger

Microsoft Visual Studio 2013 is now supported. (23595)
Dynamic shader editing is now available for GLSL shaders. (20010)
Support for the OpenGL functions ARB_vertex_attrib_binding and ARB_multi_draw_indirect has been added. (26894, 23858)
Support for the new Maxwell architecture (for example, found in the GeForce GTX 750 Ti and 750).
Added support for new GPUs, such as the GeForce GT 720M, 730M, 735M, and 740M. (18448)
Byte code-level debugging is now available for HLSL shaders. (25390)

Changed Features and Fixed Issues in the 4.0 Release of the Graphics Debugger

Frame Profiler results can be exported to an Excel/CSV file. (21949)
The Geometry viewer now supports FP16 vertices. (25278)
Pixel history no longer requires shaders to be compiled with debug information. (24807)
Visual Studio can now visualize the Render Target/Back Buffer/Frame Buffer. (21464)
Visual Studio macros are now supported, so values such as $(TargetPath), $(ProjectDir), and $(LocalDebuggerCommandArguments) are able to be added on the NVIDIA Nsight properties page. (12107)
Support for the Tesla architecture GPUs (for example, GeForce 9600GT, GeForce GTX 260) has now been deprecated. (21574)

Known Issues in the 4.0 Release of the Graphics Debugger

Due to a last minute issue, we are not able to support setting breakpoints in GLSL shaders that are being edited with dynamic shader editing. This will be fixed in the next release candidate. (27304)
OpenGL applications that use multiple rendering contexts will not be able to resolve and hit breakpoints in GLSL shaders. This will be fixed, and multiple rendering contexts fully supported, in the next release candidate. (28739)
Applications should not assume or test that they are running at a specified frame rate in Nsight. The tool will disable vsync which will cause it to potentially run at a different frame rate. This is done in order to make the application as GPU bound as possible. (28662)
Microsoft Visual Studio 2008 is being deprecated and will not be supported starting with the next release of NVIDIA® Nsight™ Visual Studio Edition.
The rop_busy hardware counter has been removed from the list of available counters, due to a hardware bug that caused the value to not be correct. If you reinstall NVIDIA Nsight, this may still be a default counter and will show unusually high values. You can either edit your graphs to remove the counter (via Nsight > Windows > Graphics HUD Configuration, or by deleting your persisted settings. To do this, open Windows Explorer and navigate to %appdata%\NVIDIA Corporation, and delete the entire Nsight directory. (29203)
While pixel history will work many times with TDR enabled, there are times when processing the number of fragments can take an undue amount of time. If this occurs, either disable TDR or set your timeout value to a value of 1 minute or more. (27835)
Since the frame scrubber only allows navigation to actions, there may be times when you select a revision for a resource that is not on or near an action. In this case, an earlier revision is selected that is on/near the selected action. This limitation will be addressed in a future version, when the scrubber allows for navigating to any event, not just actions. (26229)
The saving and loading of Frame Profiler data for Direct3D applications was disabled for this build, due to a late bug. The bug will be fixed and the feature re-enabled in a future release. (24313)
If you experience a TDR on either your host or target system, it is recommended that you reboot your machine before trying to restart the debugging session. (22986)
When performing local GPU debugging, there is a chance that the GPU will be paused long enough for the operating system's Timeout Detection and Recovery (TDR) mechanism to be triggered. The Nsight Monitor has settings in the General page to allow you to change the delay time (we recommend 30 seconds) or disable TDR completely. (22733)
NVIDIA Nsight may not work correctly unless double buffering is enabled by the application, since GLUT or other implementations may skip the SwapBuffer calls for a single buffered window. (24590)
Debugging of HLSL Effects (anything compiled with an fx_N_M target) is not supported, only pure HLSL shaders. (24891)
If you encounter rendering issues when running a serialized capture, it could be due to NVIDIA Nsight saving the debug compiled shaders. This is done so that both dynamic shader editing and shader debugging functions properly. However, there may be bugs in the shaders generated by the Direct3D compiler when running in debug. To update to the latest compiler/runtime, you can run on Windows 8, or update your Windows 7 system to SP1 and install the platform update from http://support.microsoft.com/kb/2670838.
Firewall and anti-attack software (e.g., McAfee Host Intrusion Prevention) will not allow remote debugger connections. Please disable or add an exclusion for Nsight Monitor. (22804)
The following features are not available on GPUs other than Fermi and Kepler:
- Local shader debugging,
- GLSL shader debugging,
- Maximus (CUDA + Graphics) debugging.
The Pixel value in the tables for the OpenGL profiler represent shaded pixels (i.e., fragments that ran the fragment shader). If color writes are disabled and the fragment shader doesn't write Z, this value may be 0, even though the depth value for a fragment may be written. (22061)
Due to limitations of the HLSL and GLSL compiler, debugging of shaders that were concatenated from different source files using the #line directive to refer back to the original sources may not work as expected. (22067)
The frame scrubber does not support the nvtxRangeBegin and nvtxRangeEnd functions, only nvtxRangePush and nvtxRangePop. (22163)
Managed applications built with the AnyCpu configuration are not supported. The target application must be built using either the Win32 or x64 configurations.
NVIDIA Nsight for Graphics uses a setting in the driver that enables instrumentation. If the application does not close cleanly, this setting can remain enabled, which can cause some additional CPU overhead. The Nsight Monitor has an option to disable this, in case you see any issues running your application outside of NVIDIA Nsight. (16936, 16962)
VS tracepoints, enabled by the "When Hit" breakpoint option, do not work as expected when the "Continue Execution" option is also set. Only the first hit will be reported and the target application will hang afterwards. You will need to select Debug > Stop Debugging to resume the debug target. (10904)
Source code syntax highlighting and the population of the autos window with your programs variables are set up with a file extension to programming language mapping. If these are not working, you can add the extension for your HLSL source code files to the list in the Tools > Options dialog under the Text Editor > File Extension section, and associate them with Microsoft Visual C++. (12094)
The Graphics Debugger's Autos window may not show all variables as expected. This may happen if a shader is compiled using preprocessor macros to conditionally include or exclude code lines, and those macro definitions may only be available at shader compile time. (12094)
Forcing the target application to close through the task manager while in the frame debugger may crash the target application.

DirectX Known Issues

If you are passing the D3D11_MAP_FLAG_DO_NOT_WAIT to a Map call on a Direct3D 11 Device Context, it is possible that the operation hasn't finished so you will see a return code of 0x887A000A or DXGI_ERROR_WAS_STILL_DRAWING. This can sometimes happen when the capture is trying to restore a buffer to the frame start state and it is mapped early in the frame. Simply remove the D3D11_MAP_FLAG_DO_NOT_WAIT, and it should function properly. (24846)
There are two DirectX shader compiler bugs that may cause incorrect stepping behavior.
1. The DX shader compiler will map "end-of-block" instructions to the beginning line number of the block in the HLSL source.
2. The DX shader compiler will map "implicit" returns to the beginning line number of the shader.
This issue can be resolved by always adding an "explicit" return at the end of your shader. (14656)
In some cases, very small vertex buffers cannot be retrieved from the GPU, so the Vertices3D view in the Graphics Focus Picker may not display the correct input vertices. (14192)
If a pipeline stage does not have an object bound, then the related state is not displayed on the host. For example, if there is no pixel shader bound, then no Shader Resource Views will be shown in the Pixel Shader page. (15394)
HLSL code cannot contain any non-ASCII characters. Any character with a character code >= 128 is not supported. (14760)
Applications that intercept DirectX devices or objects by use of a shim object are not supported. This interferes with an internal mechanism and therefore cannot be handled properly. (14470)
Debugging when running with Stereoscopic 3D is not supported. This will be fixed in a future version. In the meantime, please run your application with Stereoscopic 3D Stereo disabled when debugging with NVIDIA Nsight. (12618)
NVIDIA Nsight is incompatible with the debug runtime in all versions of Direct3D. While it may sometimes work, there are known incompatibilities that we are unable to support at this time.
The Graphics Debugger does not support the Reference Rasterizer (RefRast) tool, which is the CPU rasterizer provided by Microsoft. The Graphics Debugger will signal an error if the IDXGIFactory::CreateSoftwareAdapter function is used for device creation.
You may not be able to see all local or global variables in the Watch window. This can be due to optimizations performed by the HLSL compiler.
You may not be able to set a breakpoint on certain lines of source code. This can be due to optimizations performed by the HLSL compiler.
Expression evaluation and breakpoint conditions do not support HLSL built-in functions and vector and matrix expressions.

OpenGL Known Issues

OpenGL compute shader debugging is not supported.
Debugging of shaders in applications that use more than one OpenGL context may not work under some circumstances.
If you see a significant drop in frame rate when running NVIDIA Nsight, it may be due to some shader compilation optimizations that are disabled in the driver. This is a bug that will be fixed in a future driver version, but may also impact the profiler results to show the shader unit as more of a bottleneck than it truly is. (23163)
Some debugger windows, such as the Shaders List or Focus Picker, use Direct3D shader type names (i.e., hull shader) instead of GLSL shader type names.
There are times when the GUI may not refresh completely when debugging OpenGL programs. Please force the window to refresh by minimizing and restoring the window. (19869)
If you are connected to your target using VNC and attempting to debug an OpenGL program, make sure to disable any "Hook" or "Mirror" driver option in the VNC server settings. (20686)
We suggest that you disable any breakpoints in CUDA code before entering Frame Debugger (i.e., Pause and Capture Frame). There are some cases where hitting the breakpoint will cause the Frame Debugger to become unresponsive. (20721)
GLSL Shader debugging can be unstable when running with multiple displays. Please run with a single monitor when debugging GLSL shaders. (21402)
Visualization of Depth-Stencil formats in Visual Studio is limited in the OpenGL Graphics Debugger. The depth part of a DS format is displayed, but not the stencil. Note that this is not a limitation on the target application when running with NVIDIA Nsight. You can switch between Render Target, Depth, or Stencil through the HUD toolbar while the target application is being debugged.

C++ AMP Debugger Known Issues

C++ AMP degbugging is only supported on Fermi and Kepler GPUs.
The Break for every thread (like CPU behavior) option is not supported.
Attaching to an already running process may crash the process with drivers prior to R319. (17633)
Editing of variable values may not work for variables that are not arrays or declared with the title_static modifier.

Analysis Tools

New in the 4.0 Release

Microsoft Visual Studio 2013 is now supported. (23592)
Support for the CUDA 6.0 toolkit. (23608)
Support for the new Maxwell architecture (for example, found in the GeForce GTX 750 Ti and 750).
Added support for new GPUs, such as the GeForce GT 720M, 730M, 735M, and 740M. (18448)

Changed Features and Fixed Issues in the 4.0 Release of the Analysis Tools

Performance overhead when running an analysis trace has been improved. (20805)
Visual Studio macros are now supported, so values such as $(TargetPath), $(ProjectDir), and $(LocalDebuggerCommandArguments) are able to be added on the NVIDIA Nsight properties page. (12107)
Support for the Tesla architecture GPUs with SM1 (for example, GeForce GTX 260, Tesla® C1060) has now been deprecated. (21574)

Known Issues in the 4.0 Release of the Analysis Tools

Microsoft Visual Studio 2008 is being deprecated and will not be supported starting with the next release of NVIDIA® Nsight™ Visual Studio Edition.
In rare cases, the reported number of memory transactions can exceed the number of transactions caused by executing memory requests from the user's code. This mismatch occurs when the GPU or driver must add transactions that are not controllable by the user. (27520)
Firewall and anti-attack software (e.g., McAfee Host Intrusion Prevention) will not allow remote analysis connections. Please disable or add an exclusion for Nsight Monitor. (22804)
Users may find that applications which use CUDA Dynamic Parallelism (CDP) encounter an out-of-memory error with NVIDIA Nsight. This is often due to the high memory usage of the CUDA driver for CDP support. In particular, if an application calls cudaDeviceSetLimit(cudaLimitDevRuntimeSyncDepth, ...) with a large value, the driver may allocate many GBs of device memory, leaving little memory for the application itself. To work around this, select the lowest safely-usable value for cudaLimitDevRuntimeSyncDepth, which will leave more device memory available for both NVIDIA Nsight and the application itself to use. To see how much memory is being reserved for the CDP sync stack, run an Application Trace with Software Counters enabled, and check the Device Memory row underneath a call to view cudaDeviceSetLimit(cudaLimitDevRuntimeSyncDepth, ...). (24592)
Do not start an analysis capture session when the CUDA Debugger is paused on a breakpoint. Doing so can cause the system to crash. (3203)

Analysis Activity Known Issues

Tracing the following APIs is not supported in managed processes:
NVTX
OpenCL
Direct3D
OpenGL
Launching a managed .exe for tracing with any of the aforementioned APIs enabled will result in an "Access Denied" pop-up message, and the analysis session will not start.
In Trace Process Tree mode, instrumentation for tracing the aforementioned APIs can only propagate to native child processes. If a managed child process is launched, neither it nor any child process it launches (managed or native) can be instrumented by NVIDIA Nsight. The analysis session will continue unaffected, and the user will not be notified of the problem; the report will not contain data from managed processes and their children.
System and CUDA tracing is fully supported in managed processes, and in Trace Process Tree mode, tracing support propagates to all child processes (native or managed).
Managed processes are fully supported in the Profile CUDA modes.

The stop collection timer is implemented in Visual Studio. The latency to communicate to the monitor and application can result in a longer duration than requested.

CPU Thread Trace
If the Windows Kernel Event Provider is already in use when a new capture session is launched, the collected data may produce unexpected results. For best results ensure that no other kernel providers are running during an analysis session.

CUDA Trace
CUDA trace does not show implicit memory transfers for graphics interop.
CUDA Runtime API trace does not capture the <<< >>> kernel launch syntax. Instead, the corresponding CUDA Runtime API calls are reported. Some of the CUDA Driver API calls that are executed by the CUDA Runtime may report errors, such as CUDA_ERROR_INVALID_CONTEXT, even though the usage of the CUDA Runtime API is valid. (6745)
When collecting trace information about CUDA kernels and memory transfers, sometimes the report file will not contain complete information about the kernels and memory transfers. This happens because retrieving the data interferes with the application and affects performance, so the tool only does it after these events:
a call to cuCtxSynchronize()/cudaDeviceSynchronize(),
a call to cuCtxDestroy()/cudaDeviceReset(),
a call to cuStreamDestroy()/cudaStreamDestroy(),
the application launches enough kernels or memory transfers to fill up NVIDIA Nsight's buffer, so NVIDIA Nsight forces a context synchronize in order to retrieve the data.

If your capture appears to be missing some or all kernel launch or memory transfer events, either force the data to flush by adding a call to cuCtxSynchronize()/cudaDeviceSynchronize() after all the CUDA work is finished, or (for an application that continuously launches kernels and memcpys), simply capture for more time and try to generate enough data to incur NVIDIA Nsight's flush for a full buffer. (4812)

CUDA Profiler
On Tesla GPUs, branch counters include __syncthreads().
Profile Trigger increments by 1 per warp, not by 1 per active thread.
The NVIDIA Nsight CUDA Profiler cannot collect all necessary data in a single pass of the kernel, so the profiler replays the kernel as many times as necessary to collect all requisite data. Between replays of the kernel, the accessible memory is restored to the state it was in before the kernel ran, ensuring the kernel will execute the same code paths. However, the L2 cache state is not restored, so all passes after the first will execute with different data cached in L2. For kernels that access small amounts of global or local memory, this may cause the L2 cache to show hit rates better than it would achieve in normal execution. Kernels that access large amounts of memory that cannot fit entirely in L2 cache will show more accurate results.

OpenCL
The end timestamp can sometimes be recorded significantly after the completion of a command. If this occurs, adding a clFlush after specific command will fix the timestamp.
The start/end range for memory read and write commands includes both host and device time. CUDA start/end range only includes device time.
Viewing OpenCL Source or Binary code from the OpenCL Programming Builds or OpenCL Program Summary creates a temporary file in %TMP%. The temporary file is not deleted when the file is closed.
OpenCL reports occasionally do not contain device commands. This can occur if the OpenCL context/queue is not released or less than 512 events occurred during a capture.

DirectX/OpenGL Trace
Graphics workload information, such as draw calls and dispatches, are output in groups of 16384 workload events. As a consequence, a report will not contain any graphics workload information if an insufficient number of draw calls occurred during a capture. Increasing the capture duration will help to work around this limitation.
Some applications, such as Chrome, run in a sandbox environment. The effects on NVIDIA Nsight of such a sandbox are hard to predict, so if having trouble, a user should read the documentation for the target application, and disable any sandbox when possible. For Chrome, the applicable launch flag is -no-sandbox. (16426)
When you are running analysis for DX apps on a multi-GPU system, you could see a hang. When running frame timings for DX apps on a multi-GPU system, you could see a timeout waiting for the results. One possible solution would be to connect the monitor to the other GPU. Failing that, you should run analysis with only one GPU plugged into the system.

Analysis Report Known Issues

On the PM Counters report, you may encounter an error in which not all passes are displayed. (26301)

If two different host computers use the same remote target machine, it is possible that the 2 machines could generate the same report directory. This would be confusing because reports from the 2 machines would be mixed together. Although unlikely, this can occur when 2 different machines analyze an application of the same name. The NVIDIA Nsight analysis tools on the host machine create the directory name based on the name of the application.

Timeline Known Issues

There can be an error of approximately 1 microsecond between CPU events and GPU events.

Percentages displayed in the row labels and tool tips are based upon the full capture time.

The mouse forward and back buttons cannot be used to navigate the report page system.

CTRL+- toggles to the previous document instead of Zooming Out.

Double-clicking on a row containing a line/area graph that also has children will expand/collapse the row as opposed to increasing the height to 66% of the view.

Using VNC (virtual network computing) software to remotely open a Timeline Report can cause Visual Studio to crash. (7157)