NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition 5.2 User Guide
Send Feedback
You must install the NVIDIA display driver that supports the NVIDIA Nsight tools. If you have an NVIDIA graphics card installed on your target machine, you likely already have an NVIDIA display driver; however, NVIDIA Nsight requires a specific version of the driver in order to function properly. From the NVIDIA web site, download and install the following display driver (or newer):
GeForce driver release 376.09 or newer
Quadro driver release 375.86 or newer
__global__ static
attributes, the NVIDIA Nsight debugger might not be able to display local variables inside that function. Users can work around this issue by simply removing the static
qualifier on the function. (21914)x = cos() + sin()
glDrawVulkanImageNV
). 0
. As a work around, if you click on a different face, the histogram will be recalculated for that face. In a future version, this will be fixed to take into account all faces. (34403)
rop_busy
hardware counter has been removed from the list of available counters, due to a hardware bug that caused the value to not be correct. If you reinstall NVIDIA Nsight, this may still be a default counter and will show unusually high values. You can either edit your graphs to remove the counter (via Nsight > Windows > Graphics HUD Configuration, or by deleting your persisted settings. To do this, open Windows Explorer and navigate to %appdata%\NVIDIA Corporation
, and delete the entire Nsight directory. (29203)SwapBuffer
calls for a single buffered window. (24590) #line
directive to refer back to the original sources may not work as expected. (22067)nvtxRangeBegin
and nvtxRangeEnd
functions, only nvtxRangePush
and nvtxRangePop
. (21499)Present(0, 1)
, the Frame Timings and Frame Profiler will not function properly. We suggest going with one call to Present(0, 0)
to work around the problem. We are investigating a solution to handle setup. (39687)fx_N_M
target) is not supported, only pure HLSL shaders. (24891) D3D11_MAP_FLAG_DO_NOT_WAIT
to a Map call on a Direct3D 11 Device Context, it is possible that the operation hasn't finished, and that you will see a return code of 0x887A000A
or DXGI_ERROR_WAS_STILL_DRAWING
. This can happen when the capture is trying to restore a buffer to the frame start state and it is mapped early in the frame. Simply remove the D3D11_MAP_FLAG_DO_NOT_WAIT
, and it should function properly. (24846)This issue can be resolved by always adding an "explicit" return at the end of your shader. (14656)
RefRast
) tool, which is the CPU rasterizer provided by Microsoft. The Graphics Debugger will signal an error if the IDXGIFactory::CreateSoftwareAdapter
function is used for device creation.
glGen*
is advised for performance and correctness. Under normal operation, if an application uses a mix of generated and non-generated names (e.g. via a middleware product), there is the possibility of conflicts/aliases between names. When running an application under Nsight, Nsight will create its own resources that use names generated by the driver, further increasing the likelihood of conflicts. The recommended approach is to always use names that are generated by the driver via glGen*
. (30578) Z
, this value may be 0
, even though the depth value for a fragment may be written. (22061)
CUDA_AUTO_BOOST
to 1
in the Nsight Monitor process, which allows for higher performance at the cost of less repeatability for measurements. (34102) gpu_affinity
. This will be fixed in a future release. (22071) Analysis Activity Known Issues
- Tracing the following APIs is not supported in managed processes:
- NVTX
- OpenCL
- Direct3D
- OpenGL
- Launching a managed
.exe
for tracing with any of the aforementioned APIs enabled will result in an "Access Denied" pop-up message, and the analysis session will not start.
- In Trace Process Tree mode, instrumentation for tracing the aforementioned APIs can only propagate to native child processes. If a managed child process is launched, neither it nor any child process it launches (managed or native) can be instrumented by NVIDIA Nsight. The analysis session will continue unaffected, and the user will not be notified of the problem; the report will not contain data from managed processes and their children.
- System and CUDA tracing is fully supported in managed processes, and in Trace Process Tree mode, tracing support propagates to all child processes (native or managed).
- Managed processes are fully supported in the Profile CUDA modes.
- The stop collection timer is implemented in Visual Studio. The latency to communicate to the monitor and application can result in a longer duration than requested.
- CPU Thread Trace
If the Windows Kernel Event Provider is already in use when a new capture session is launched, the collected data may produce unexpected results. For best results ensure that no other kernel providers are running during an analysis session.- CUDA Trace
- CUDA trace does not show implicit memory transfers for graphics interop.
- CUDA Runtime API trace does not capture the <<< >>> kernel launch syntax. Instead, the corresponding CUDA Runtime API calls are reported. Some of the CUDA Driver API calls that are executed by the CUDA Runtime may report errors, such as
CUDA_ERROR_INVALID_CONTEXT
, even though the usage of the CUDA Runtime API is valid. (6745)- When collecting trace information about CUDA kernels and memory transfers, sometimes the report file will not contain complete information about the kernels and memory transfers. This happens because retrieving the data interferes with the application and affects performance, so the tool only does it after these events:
If your capture appears to be missing some or all kernel launch or memory transfer events, either force the data to flush by adding a call to
- a call to
cuCtxSynchronize()/cudaDeviceSynchronize()
,- a call to
cuCtxDestroy()/cudaDeviceReset()
,- a call to
cuStreamDestroy()/cudaStreamDestroy()
,- the application launches enough kernels or memory transfers to fill up NVIDIA Nsight's buffer, so Nsight forces a context synchronize in order to retrieve the data.
cuCtxSynchronize()/cudaDeviceSynchronize()
after all the CUDA work is finished, or (for an application that continuously launches kernels and memcpys), simply capture for more time and try to generate enough data to incur NVIDIA Nsight's flush for a full buffer. (4812)- CUDA Profiler
- Profile Trigger increments by 1 per warp, not by 1 per active thread.
- The NVIDIA Nsight CUDA Profiler cannot collect all necessary data in a single pass of the kernel, so the profiler replays the kernel as many times as necessary to collect all requisite data. Between replays of the kernel, the accessible memory is restored to the state it was in before the kernel ran, ensuring the kernel will execute the same code paths. However, the L2 cache state is not restored, so all passes after the first will execute with different data cached in L2. For kernels that access small amounts of global or local memory, this may cause the L2 cache to show hit rates better than it would achieve in normal execution. Kernels that access large amounts of memory that cannot fit entirely in L2 cache will show more accurate results.
- OpenCL
- The end timestamp can sometimes be recorded significantly after the completion of a command. If this occurs, adding a
clFlush
after specific command will fix the timestamp.- The start/end range for memory read and write commands includes both host and device time. CUDA start/end range only includes device time.
- Viewing OpenCL Source or Binary code from the OpenCL Programming Builds or OpenCL Program Summary creates a temporary file in %TMP%. The temporary file is not deleted when the file is closed.
- OpenCL reports occasionally do not contain device commands. This can occur if the OpenCL context/queue is not released or less than 512 events occurred during a capture.
- DirectX/OpenGL Trace
- Graphics workload information, such as draw calls and dispatches, are output in groups of 16384 workload events. As a consequence, a report will not contain any graphics workload information if an insufficient number of draw calls occurred during a capture. Increasing the capture duration will help to work around this limitation.
- Some applications, such as Chrome, run in a sandbox environment. The effects on NVIDIA Nsight of such a sandbox are hard to predict, so if having trouble, a user should read the documentation for the target application, and disable any sandbox when possible. For Chrome, the applicable launch flag is
-no-sandbox
. (16426)
Analysis Report Known Issues
- On the Performance Counters report, you may encounter an error in which not all passes are displayed. (26301)
- If two different host computers use the same remote target machine, it is possible that the 2 machines could generate the same report directory. This would be confusing because reports from the 2 machines would be mixed together. Although unlikely, this can occur when 2 different machines analyze an application of the same name. The NVIDIA Nsight analysis tools on the host machine create the directory name based on the name of the application.
Timeline Known Issues
- Skew between CPU events and GPU events will typically not exceed one microsecond.
- Percentages displayed in the row labels and tool tips are based upon the full capture time.
- The mouse forward and back buttons cannot be used to navigate the report page system.
- CTRL+- toggles to the previous document instead of Zooming Out.
- Double-clicking on a row containing a line/area graph that also has children will expand/collapse the row as opposed to increasing the height to 66% of the view.
NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition User Guide Rev. 5.2.161206 ©2009-2016. NVIDIA Corporation. All Rights Reserved.