You are here: Release Notes
Important information about the NVIDIA® Nsight™ Visual Studio Edition 5.1 release
You must install the NVIDIA display driver that supports the NVIDIA Nsight tools. If you have an NVIDIA graphics card installed on your target machine, you likely already have an NVIDIA display driver; however, NVIDIA Nsight requires a specific version of the driver in order to function properly. From the NVIDIA web site, download and install the following display driver (or newer):
Driver Release 361.96, Release 361 or newer
See below for more release information about:
Common Nsight Features
New in the 5.1 Release of NVIDIA Nsight
- NOTE: The 5.1.0 RC1 version was promoted as-is to 5.1.0 final.
New in the 5.1 Release
- Bug fixes and performance improvements.
Changed Features and Fixed Issues in the 5.1 Release of the CUDA Debugger
- 32-bit operating systems are no longer supported.
- Starting with the next release of NVIDIA Nsight, Fermi GPUs will no longer be supported.
Known Issues in the 5.1 Release of the CUDA Debugger
- Firewall and anti-intrusion software (e.g., McAfee Host Intrusion Prevention) will not allow remote debugger connections. Please disable or add an exclusion for the Nsight Monitor. (22804)
- Fermi devices with an attached display can hang when stopping debugging while at a breakpoint. This occurs in hardware mode debugging. Please use preemption mode debugging, or switch to a GPU without an attached display. (18778)
- In some cases, when the CUDA application is built with the "Generate Relocatable Device Code" option, and a CUDA kernel function is declared with the
__global__ static attributes, the NVIDIA Nsight debugger might not be able to display local variables inside that function. Users can work around this issue by simply removing the
static qualifier on the function. (21914)
- You must enable Memory Checker before launching a process, and cannot change the setting while debugging. (18935, 18937)
- When the CUDA Debugger is used to debug CUDA applications which share resources with DirectX 9 (such as the "simpleD3D9" sample program), the debugger may display incorrect values for memory locations in those shared resources. This may happen when the GPU device executing the application is Compute Capability 2.0 or higher. Incorrect values for the contents of memory may be displayed in any debug window (Autos, Locals, Watch, Warp Watch, or Memory). This issue does not affect applications using Direct3D 11. (13899)
- On Fermi GPUs, after an MMU fault, the CUDA Debugger will not resume normally. The application may still be terminated. (13808)
- When using the CUDA Debugger with NVIDIA Nsight, breakpoints will not be hit in source files whose full paths contain non-ASCII characters. Any path with a character code >= 128 is affected. (11429)
- If you experience hangs or TDRs while locally debugging CUDA on a single GPU (or using the Software Preemption debugging mode in general), try disabling operating system features that use video hardware acceleration. For example, disabling Aero on Windows 7, changing to a high-contrast desktop theme on Windows 8, or disabling WPF acceleration.
- Variables do not appear for source code that is not executed. This occurs because the compiler aggressively optimizes code even if you have not specified any compiler optimizations. As a result, the compiler removes any code that will not be executed from the output executable.
- Breakpoints will hit multiple times on lines that have more than one inline function call. For example, setting a breakpoint on:
x = cos() + sin()
will generate three breakpoints on that line. One for the evaluation of the expression, plus one for each function on the line.
- Unloading modules does not refresh the state of breakpoints set in that module. This means that those breakpoints do not show their latest state in Visual Studio when they have been unloaded.
- The Visual Studio Breakpoint "Filter" option is not supported for CUDA GPU breakpoints.
- The Visual Studio Breakpoint "Hitcount" option is not supported for CUDA GPU breakpoints.
- When starting Graphics or CUDA Debugging in Visual Studio, the user cannot specify environment variables to be used in the environment block of the launched process.
- The F5 hotkey (which is the default hotkey in Visual Studio for starting the CPU Debugger) does not start the CUDA Debugger.
To start the CUDA Debugger, you must either change the key bindings or use the menu command:
Nsight > Start CUDA Debugging.
- There is no support for automatically performing a Build when launching the CUDA Debugger.
- The Load Symbols option, or "Symbols settings," in the Modules view is not supported for CUDA debugging.
Graphics Inspector and Graphics Debugger
New in the 5.1 Release of the Graphics Debugger
- The Frame Debugger has added the "Capture Next Frame" feature to help capture intermittent issues.
- OpenGL users can now retry capture if unsupported operations are seen.
- Shader editing for Direct3D 12 applications is now available.
- The graph configuration user interface has been improved.
- Improvements have been made to the scrubber for multi-threaded Direct3D 12 applications.
- The Geometry View supports more mapping of parameters to output visualization, and there's a new memory view that displays the structure and values of input vertices.
Changed Features and Fixed Issues in the 5.1 Release of the Graphics Debugger
- In the API Statistics view, some internal Nsight events, such as the end of a command list, no longer execute as "other" in the counts and table. (35647)
- The results on the Frame Timings are no longer incorrect when you run the Frame Profiler after opening the Frame Timings. (35512)
- An issue with analysis workload tracing and the frame timings feature in the Frame Debugger on some newer Maxwell GPUs has been resolved as of driver release 344.03. (31982)
Known Issues in the 5.1 Release of the Graphics Debugger
- If an application that uses NVTX fails to serialize a capture, ensure that the NVTX SDK has been installed. (42452)
- Some shader creation patterns can cause prolonged startup times. If you encounter these long startup times, you can disable shader debugging in order to avoid these delays. (40240)
- Using NVIDIA Nsight with the Oculus Dev Kit version 0.6 or earlier will potentially cause a crash due to a known incompatibility with the Oculus SDK. If you change to use extended mode, instead of direct mode, it should resolve the instability. This issue has been resolved in the Oculus SDK 0.7 release. (36060)
- When calculating the histogram for array or cube map resources, the tool only takes into account face
0. As a work around, if you click on a different face, the histogram will be recalculated for that face. In a future version, this will be fixed to take into account all faces. (34403)
- If your application has a large number of textures and buffers (in the order of 50000+ alive), the capture time can increase to be a minute or more. We will work on reducing this overhead in a future release.
- The various scrubbers have a white background, so seeing performance markers that are also colored white can be difficult. We will address this in a future version, but in the meantime please try to use a color for the markers that will contrast well with a white background. (31704)
rop_busy hardware counter has been removed from the list of available counters, due to a hardware bug that caused the value to not be correct. If you reinstall NVIDIA Nsight, this may still be a default counter and will show unusually high values. You can either edit your graphs to remove the counter (via Nsight > Windows > Graphics HUD Configuration, or by deleting your persisted settings. To do this, open Windows Explorer and navigate to
%appdata%\NVIDIA Corporation, and delete the entire Nsight directory. (29203)
- If you see a significant drop in frame rate when running NVIDIA Nsight, it may be due to some shader compilation optimizations that are disabled in the driver. This is a bug that will be fixed in a future driver version, but may also impact the profiler results to show the shader unit as more of a bottleneck than it truly is. (23163)
- When performing local GPU debugging, there is a chance that the GPU will be paused long enough for the operating system's Timeout Detection and Recovery (TDR) mechanism to be triggered. The Nsight Monitor has settings in the General page to allow you to change the delay time (we recommend 30 seconds) or disable TDR completely. (22733)
- If you experience a TDR on either your host or target system, it is recommended that you reboot your machine before trying to restart the debugging session. (22986)
- While pixel history will work with TDR enabled, there are times when processing the number of fragments can take an undue amount of time. If this occurs, either disable TDR or set your timeout value to a value of 1 minute or more. (27835)
- Since the frame scrubber only allows navigation to actions, there may be times when you select a revision for a resource that is not on or near an action. In this case, an earlier revision is selected that is on/near the selected action. This limitation will be addressed in a future version, when the scrubber allows for navigating to any event, not just actions. (26229)
- NVIDIA Nsight may not work correctly unless double buffering is enabled by the application, since GLUT or other toolkits may skip the
SwapBuffer calls for a single buffered window. (24590)
- Firewall and anti-attack software (e.g., McAfee Host Intrusion Prevention) will not allow remote debugger connections. Please disable or add an exclusion for the Nsight Monitor. (22804)
- Due to limitations of the HLSL compiler, debugging of shaders that were concatenated from different source files using the
#line directive to refer back to the original sources may not work as expected. (22067)
- The frame scrubber does not support the
nvtxRangeEnd functions, only
- Managed applications built with the AnyCpu configuration are not supported. The target application must be built using either the Win32 or x64 configurations.
- NVIDIA Nsight for Graphics uses a setting in the driver that enables instrumentation. If the application does not close cleanly, this setting can remain enabled, which can cause some additional CPU overhead. The Nsight Monitor has an option to disable this, in case you see any issues running your application outside of NVIDIA Nsight. (16936, 16962)
- VS tracepoints, enabled by the "When Hit" breakpoint option, do not work as expected when the "Continue Execution" option is also set. Only the first hit will be reported and the target application will hang afterwards. You will need to select Debug > Stop Debugging to resume the debug target. (10904)
- Source code syntax highlighting and the population of the autos window with your programs variables are set up with a file extension to programming language mapping. If these are not working, you can add the extension for your HLSL source code files to the list in the Tools > Options dialog under the Text Editor > File Extension section, and associate them with Microsoft Visual C++. (12094)
- The Graphics Debugger's Autos window may not show all variables as expected. This may happen if a shader is compiled using preprocessor macros to conditionally include or exclude code lines, and those macro definitions may only be available at shader compile time. (12094)
- Forcing the target application to close through the task manager while in the Frame Debugger may crash the target application.
DirectX Known Issues
- Due to driver issues, we were not able to finish testing of the transform feedback page of the API Inspector for Direct3D 12 applications. Please contact email@example.com if you encounter any problems.(34886)
- The Tiled Texture Viewer does not currently support 3D tiled texture. This will be functional in the final version. (39384)
- If your D3D11 application uses multiple Present calls, specifically using
Present(0, 1), the Frame Timings and Frame Profiler will not function properly. We suggest going with one call to
Present(0, 0) to work around the problem. We are investigating a solution to handle setup. (39687)
- On applications that use DirectCompute, you will see your dispatch calls in the Frame Timings view, but note that the time values will be close to 0 because the values are not collecting when populating this view. The Profiler view has accurate timing information and other statistics for DirectCompute dispatches. (34033)
- Debugging of HLSL Effects (anything compiled with an
fx_N_M target) is not supported, only pure HLSL shaders. (24891)
- If you encounter rendering issues when running a serialized capture, it could be due to NVIDIA Nsight saving debug compiled shaders. Debug shaders are used so that both dynamic shader editing and shader debugging functions properly. However, there may be bugs in the shaders generated by the Direct3D compiler when running in debug. To update to the latest compiler/runtime, you can run on Windows 8, or update your Windows 7 system to SP1 and install the platform update from http://support.microsoft.com/kb/2670838.
- Because of driver restrictions, NVIDIA Nsight does not support creating a Direct3D 11 device with the Direct3D 9 feature set. This configuration may cause instability and application crashes. (32185)
- The Direct3D runtime documentation states that, "the return values of AddRef & Release may be unstable and should not be relied upon." The NVIDIA Nsight Frame Debugger will also take additional references on objects so any code that relies on an exact reference count at a particular time may fail. In general, users should not expect an exact reference count to be returned from the Direct3D runtime. For more information, see Microsoft's Rules for Managing Reference Counts. (30826)
- If you are passing the
D3D11_MAP_FLAG_DO_NOT_WAIT to a Map call on a Direct3D 11 Device Context, it is possible that the operation hasn't finished, and that you will see a return code of
DXGI_ERROR_WAS_STILL_DRAWING. This can happen when the capture is trying to restore a buffer to the frame start state and it is mapped early in the frame. Simply remove the
D3D11_MAP_FLAG_DO_NOT_WAIT, and it should function properly. (24846)
- There are two DirectX shader compiler bugs that may cause incorrect stepping behavior.
- The DX shader compiler will map "end-of-block" instructions to the beginning line number of the block in the HLSL source.
- The DX shader compiler will map "implicit" returns to the beginning line number of the shader.
This issue can be resolved by always adding an "explicit" return at the end of your shader. (14656)
- In some cases, very small vertex buffers cannot be retrieved from the GPU, so the Vertices3D view in the Graphics Focus Picker may not display the correct input vertices. (14192)
- If a pipeline stage does not have an object bound, the related state will not be displayed on the host. For example, if there is no pixel shader bound, no Shader Resource Views will be shown in the Pixel Shader page. (15394)
- HLSL code cannot contain any non-ASCII characters. Any character with a character code >= 128 is not supported. (14760)
- Applications that intercept DirectX devices or objects by use of a shim object are not supported. This interferes with an internal mechanism and therefore cannot be handled properly. (14470)
- Debugging when running with Stereoscopic 3D is not supported. This will be fixed in a future version. In the meantime, please run your application with Stereoscopic 3D Stereo disabled when debugging with NVIDIA Nsight. (12618)
- NVIDIA Nsight is incompatible with the debug runtime in all versions of Direct3D. While it may sometimes work, there are known incompatibilities that we are unable to support at this time.
- The Graphics Debugger does not support the Reference Rasterizer (
RefRast) tool, which is the CPU rasterizer provided by Microsoft. The Graphics Debugger will signal an error if the
IDXGIFactory::CreateSoftwareAdapter function is used for device creation.
- You may not be able to see all local or global variables in the Watch window. This can be due to optimizations performed by the HLSL compiler.
- You may not be able to set a breakpoint on certain lines of source code. This can be due to optimizations performed by the HLSL compiler.
- Expression evaluation and breakpoint conditions do not support HLSL built-in functions and vector and matrix expressions.
OpenGL Known Issues
- OpenGL compute shader debugging is not supported.
- Debugging of shaders in applications that use more than one OpenGL context may not work under some circumstances.
- Using names that are generated by
glGen* is advised for performance and correctness. Under normal operation, if an application uses a mix of generated and non-generated names (e.g. via a middleware product), there is the possibility of conflicts/aliases between names. When running an application under Nsight, Nsight will create its own resources that use names generated by the driver, further increasing the likelihood of conflicts. The recommended approach is to always use names that are generated by the driver via
- The Pixel value in the tables for the OpenGL profiler represent shaded pixels (i.e., fragments that ran the fragment shader). If color writes are disabled and the fragment shader doesn't write
Z, this value may be
0, even though the depth value for a fragment may be written. (22061)
- Some debugger windows, such as the Shaders List or Focus Picker, use Direct3D shader type names (i.e., hull shader) instead of GLSL shader type names.
- If you are connected to your target using VNC and attempting to debug an OpenGL program, make sure to disable any "Hook" or "Mirror" driver option in the VNC server settings. (20686)
- We suggest that you disable any breakpoints in CUDA code before entering Frame Debugger (i.e., Pause and Capture Frame). There are some cases where hitting the breakpoint will cause the Frame Debugger to become unresponsive. (20721)
New in the 5.1 Release
- Bug fixes and performance improvements.
Changed Features and Fixed Issues in the 5.1 Release of the Analysis Tools
- 32-bit operating systems are no longer supported.
- Starting with the next release of NVIDIA Nsight, Fermi GPUs will no longer be supported.
Known Issues in the 5.1 Release of the Analysis Tools
- When using NVIDIA Nsight Performance Analysis to trace Direct3D apps on a multi-GPU system, GPU work may be attributed to the wrong GPU in the summary page. The other pages in the analysis report should attribute the GPU work correctly. This is fixed in the upcoming 358 driver, so the problem will not occur after updating to a 358 or newer driver (when available). (40334)
- In order to capture ETW events, the Nsight Monitor must be run with elevated privileges by right-clicking on the Nsight Monitor icon and selecting Run As Administrator. (Note that even if you are logged in with an Administrator account, you will have to explicitly run Nsight Monitor as administrator in order for ETW events to be captured.) (12193)
- In order to prevent Tesla AutoBoost from causing applications to have non-repeatable behavior in NVIDIA Nsight Analysis, the AutoBoost feature is disabled. This will cause apps running under NVIDIA Nsight Analysis to run slower than they would outside of NVIDIA Nsight on GPUs where AutoBoost defaults to enabled (for example, the Tesla K80). NVIDIA Nsight Analysis will not disable AutoBoost if the user sets the environment variable
1 in the Nsight Monitor process, which allows for higher performance at the cost of less repeatability for measurements. (34102)
- There may be some instability when trying to use Analysis Tracing on multi-GPU Quadro systems where the app uses
gpu_affinity. This will be fixed in a future release. (22071)
- Analysis workload tracing on newer Maxwell GPUs requires an updated driver. Older drivers will display an error message in the analysis report summary. (31982)
- In rare cases, the reported number of memory transactions can exceed the number of transactions caused by executing memory requests from the user's code. This mismatch occurs when the GPU or driver must add transactions that are not controllable by the user. (27520)
- Firewall and anti-attack software (e.g., McAfee Host Intrusion Prevention) will not allow remote analysis connections. Please disable or add an exclusion for the Nsight Monitor. (22804)
- Do not start an analysis capture session when the CUDA Debugger is paused on a breakpoint. Doing so can cause the system to crash. (3203)
Analysis Activity Known Issues
Tracing the following APIs is not supported in managed processes:
Launching a managed
.exe for tracing with any of the aforementioned APIs enabled will result in an "Access Denied" pop-up message, and the analysis session will not start.
In Trace Process Tree mode, instrumentation for tracing the aforementioned APIs can only propagate to native child processes. If a managed child process is launched, neither it nor any child process it launches (managed or native) can be instrumented by NVIDIA Nsight. The analysis session will continue unaffected, and the user will not be notified of the problem; the report will not contain data from managed processes and their children.
System and CUDA tracing is fully supported in managed processes, and in Trace Process Tree mode, tracing support propagates to all child processes (native or managed).
Managed processes are fully supported in the Profile CUDA modes.
- The stop collection timer is implemented in Visual Studio. The latency to communicate to the monitor and application can result in a longer duration than requested.
- CPU Thread Trace
If the Windows Kernel Event Provider is already in use when a new capture session is launched, the collected data may produce unexpected results. For best results ensure that no other kernel providers are running during an analysis session.
- CUDA Trace
- CUDA trace does not show implicit memory transfers for graphics interop.
- CUDA Runtime API trace does not capture the <<< >>> kernel launch syntax. Instead, the corresponding CUDA Runtime API calls are reported. Some of the CUDA Driver API calls that are executed by the CUDA Runtime may report errors, such as
CUDA_ERROR_INVALID_CONTEXT, even though the usage of the CUDA Runtime API is valid. (6745)
- When collecting trace information about CUDA kernels and memory transfers, sometimes the report file will not contain complete information about the kernels and memory transfers. This happens because retrieving the data interferes with the application and affects performance, so the tool only does it after these events:
If your capture appears to be missing some or all kernel launch or memory transfer events, either force the data to flush by adding a call to
- a call to
- a call to
- a call to
- the application launches enough kernels or memory transfers to fill up NVIDIA Nsight's buffer, so Nsight forces a context synchronize in order to retrieve the data.
cuCtxSynchronize()/cudaDeviceSynchronize() after all the CUDA work is finished, or (for an application that continuously launches kernels and memcpys), simply capture for more time and try to generate enough data to incur NVIDIA Nsight's flush for a full buffer. (4812)
- CUDA Profiler
- Profile Trigger increments by 1 per warp, not by 1 per active thread.
- The NVIDIA Nsight CUDA Profiler cannot collect all necessary data in a single pass of the kernel, so the profiler replays the kernel as many times as necessary to collect all requisite data. Between replays of the kernel, the accessible memory is restored to the state it was in before the kernel ran, ensuring the kernel will execute the same code paths. However, the L2 cache state is not restored, so all passes after the first will execute with different data cached in L2. For kernels that access small amounts of global or local memory, this may cause the L2 cache to show hit rates better than it would achieve in normal execution. Kernels that access large amounts of memory that cannot fit entirely in L2 cache will show more accurate results.
- The end timestamp can sometimes be recorded significantly after the completion of a command. If this occurs, adding a
clFlush after specific command will fix the timestamp.
- The start/end range for memory read and write commands includes both host and device time. CUDA start/end range only includes device time.
- Viewing OpenCL Source or Binary code from the OpenCL Programming Builds or OpenCL Program Summary creates a temporary file in %TMP%. The temporary file is not deleted when the file is closed.
- OpenCL reports occasionally do not contain device commands. This can occur if the OpenCL context/queue is not released or less than 512 events occurred during a capture.
- DirectX/OpenGL Trace
- Graphics workload information, such as draw calls and dispatches, are output in groups of 16384 workload events. As a consequence, a report will not contain any graphics workload information if an insufficient number of draw calls occurred during a capture. Increasing the capture duration will help to work around this limitation.
- Some applications, such as Chrome, run in a sandbox environment. The effects on NVIDIA Nsight of such a sandbox are hard to predict, so if having trouble, a user should read the documentation for the target application, and disable any sandbox when possible. For Chrome, the applicable launch flag is
Analysis Report Known Issues
- On the Performance Counters report, you may encounter an error in which not all passes are displayed. (26301)
- If two different host computers use the same remote target machine, it is possible that the 2 machines could generate the same report directory. This would be confusing because reports from the 2 machines would be mixed together. Although unlikely, this can occur when 2 different machines analyze an application of the same name. The NVIDIA Nsight analysis tools on the host machine create the directory name based on the name of the application.
Timeline Known Issues
- Skew between CPU events and GPU events will typically not exceed one microsecond.
- Percentages displayed in the row labels and tool tips are based upon the full capture time.
- The mouse forward and back buttons cannot be used to navigate the report page system.
- CTRL+- toggles to the previous document instead of Zooming Out.
- Double-clicking on a row containing a line/area graph that also has children will expand/collapse the row as opposed to increasing the height to 66% of the view.
NVIDIA® Nsight™ Development Platform, Visual Studio Edition User Guide Rev. 5.1.160713 ©2009-2016. NVIDIA Corporation. All Rights Reserved.