User Guide
The user guide for NVIDIA Nsight Graphics.
Introduction to NVIDIA Nsight Graphics
Nsight Graphics™ is a standalone application for the debugging, profiling, and analysis of graphics applications. Nsight Graphics supports applications built with Direct3D 11, Direct3D 12, Vulkan, OpenGL, and OpenXR.
This documentation is separated up into different sections to help you understand how to get started using the software, understand activities, and offer a reference on the user interface.
- Getting Started — Offers a brief introduction on how to use the tools. 
- Activities — Nsight Graphics supports multiple activities to target your workload to the need of your work at a particular point in time. This section documents each of these activities in detail. 
- User Interface Reference — Provides a deep view of all of the user interface elements and views that Nsight Graphics offers. 
- Appendix — Contains a selection of topics on concerns not covered by any other section. 
Getting Started
This section describes an approach to using the Nsight Graphics tools.
Expected Workflow
When debugging or profiling, it is important to narrow your investigation to the path that provides the most impactful and actionable data for you to make conclusions and solve problems. Nsight Graphics provides a number of tools to fit each of these workflow scenarios.
When debugging a rendering problem, Nsight Graphics’s Frame Debugger is the tool of choice. This tool enables the inspection of events, API state, resource values, and dependencies to understand where your application might have issues. For more information on the Frame Debugger, see Frame Debugger.
When profiling a graphical application, the first step is to determine if you are CPU or GPU bound. If you are CPU bound, you cannot issue enough work to the GPU to take full advantage of its full processing power. If you are GPU bound, the GPU is not able to process the work it is issued fast enough and your engine may stall. One way of making the determination of which aspect is limiting you is to use Nsight Systems™. Nsight Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you select the largest opportunities to optimize, and tune to scale efficiently across any quantity of CPUs and GPUs in your computer. NVIDIA also provides a system analysis and trace tool within Nsight Visual Studio Edition; for more information on that tool see this site.
If you have determined that you are CPU bound, you need to use a CPU profiling tool to discover how you can eliminate inefficiencies to issue work faster to the GPU. You may also want to look into the overhead of the API constructs you are using and determine if there are more lighter weight constructs that can offer the same effect at less cost. The Frame Debugger tool is an excellent resource while you are making these adjustments to your engine.
The GPU Trace activity within Nsight Graphics allows for analysis of a few different GPU bound scenarios. GPU Trace offers a deep analysis of your SM’s performance by tracing the execution of your shaders on the SM across a series of frames. Another key technique in optimizing performance is to take advantage of the GPUs ability to process parallel work by using techniques to achieve simultaneous compute and graphics (SCG), also known as async compute. GPU Trace allows you to both see opportunities for async compute as well as to confirm and measure the impact of async compute on your frame.
How to Launch and Connect to Your Application
To analyze an application, Nsight Graphics requires the launching of applications through its launching facilities. The sections below describe creating a project, launching the application, and connecting to it so that you can perform your analysis.
Upon starting Nsight Graphics, you are presented with the option to create a project. If you are using Nsight Graphics for the first time, skip project creation by selecting continue under Start Activity.
 
Once selected, you are presented with a target-specific dialog that allows you to configure the application to launch. Browse and select the activity you wish to run and then proceed to the target-specific instructions below to configure the application to analyze.
 
The target-specific sections below describe how to launch and connect on each specific platform. While the process may be different on different targets, there are many commonalities between all systems. In particular, once a process is launched, the Nsight Graphics host must attach to that process in order to analyze it. This logical separation of launch and attach facilities allows for complex use cases including remote targets, launching though command lines, reattaching to previous sessions, etc. However, the Nsight Graphics host does simplify many common cases by supporting user-controlled automatic connection to processes that were just launched. The sections below cover these uses cases and more, in turn.
Process Launch and Connection on Windows Targets
On the Windows platform, Nsight Graphics supports debugging and profiling of x64 native applications as well as x86 (32-bit) launchers (e.g., Steam, Origin) that start x64 (64-bit) applications.
Launching an Application with Automatic Attach
Nsight Graphics supports automatic attach to processes of interest. It accomplishes this by identifying the processes in a process hierarchy that perform graphics work, signaling that these are of interest.
To launch your application, perform the following steps:
- Set the application executable to the path of your application. This path may be any executable or batch file. 
- If your application requires a working directory that is different from your application’s directory, adjust it now. 
- Adjust the environment (if necessary). 
- Leave Automatically Connect as Yes. 
- Click Launch. 
Once launched, you are presented with a dialog that notes the launching and attaching of your application. After the launch completes, you are ready to begin your analysis.
Connecting to an Application with Manual Attach
There may be some cases where a manual attach to an application is desired. These situations include:
- Using the command line launcher to launch applications (see Process Launch from a Command Line) 
- Automatic attach is attaching to an application other than the one you want 
- Connecting to an application that has previously been detached and reattach to the analysis session is desired 
If the application is already launched, perform the following steps:
- Click the Connect button. 
- Select the Attach tab. 
- Select the application you wish to analyze in the attach tab and click Attach. 
After the launch completes, you are ready to begin your analysis.
 
In the example image above, VRFunhouse.exe is a child process of the UE4Game.exe launcher. Selecting VRFunhouse.exe and clicking Attach would allow you to analyze the primary application.
Remote Launching
Remote debugging is supported on Nsight Graphics on Windows through use of the Nsight Remote Monitor. This is a process that runs on a target machine to allow connections to be started on that machine.
To run the remote monitor, install Nsight Graphics on the target machine. Then, launch the remote monitor on that machine by Start > NVIDIA Corporation > Nsight Remote Monitor.
Once the monitor is launched on the remote machine, you need to add the remote monitor as a connection in Nsight Graphics. By default, launches are done on the localhost machine. To add another machine, click the + button.
 
This brings up a dialog in which you can add a machine name or IP address.
 
Enter the machine name in IP/Host Name. Click Add to add the connection. The machine you just added is listed as the target connection at this time.
 
Any number of connections may be added; connections can be removed by clicking - on the selected connection. The connections may be switched between any of the added connections before launch or attach. Connections are globally persisted and may be applied to any project once they are added.
Process Launch and Connection on Linux Targets
Remote debugging on Linux is supported through SSH connections. Enter your SSH information when establishing the connection to connect to the target machine.
Process Launch from a Command Line
Nsight Graphics offers a command line interface (CLI) to facilitate launching on applications for which the environment setup can be complex to transfer to the host application. Currently it also provides a non-interactive way to Generate C++ Capture from a Command Line or Generate GPU Trace Capture from a Command Line).
This executable is located in the host application folder:
Windows
<install directory>/host/windows-desktop-nomad-x64/ngfx.exe
Linux
<install directory>/host/linux-desktop-nomad-x64/ngfx.bin
Note
The original command line launcher (nv-nsight-launcher) has been replaced by this CLI (ngfx), because this CLI extends the original launcher’s capabilities.
CLI Arguments Details
To understand how to launch, start by launching the CLI with the --help argument. This will display what a general options the CLI has.
Several of these arguments are optional, while some are required. The full argument list is the following:
| Option | Description | 
|---|---|
| 
 | Display all available general arguments. | 
| 
 | Display all available arguments, including all activity specific arguments. | 
| 
 | Select the target activity to use. This argument is always required. Note: “activity name” must be the exact name in “Activity” section of connection dialog. The available activity names can also be found in the help message ( | 
| 
 | Select the target platform to use. This argument is recommended to be set, otherwise the local platform is used as a default. Note: “platform name” must be the exact name in “Target Platform” section of connection dialog. The available platform names can also be found in the help message ( | 
| 
 | Select the device on which to launch the application. Default value is “localhost.” This argument is required if trying to launch the application in a machine with  | 
| 
 | Set the executable path on target device. Note: This argument is usually required, but can be implicitly deduced from project settings if a project has been loaded (see  | 
| 
 | Set the working directory of the application to be launched. | 
| 
 | Set the additional environments of the application to be launched. These arguments should be in the form of “FOO=1; BAR=0;”. | 
| 
 | Set the arguments passed to the application to be launched. | 
| 
 | Select an Nsight Graphics project to load. If a project has been successfully loaded, some arguments (e.g.,  If there’s a dedicated project for a certain application, and there are changed and saved options for adjusting the activity, it is preferred to run the CLI with this argument. | 
| 
 | Set the output folder to export/write data to. If not specified, the default document folder on the Nsight Graphics GUI is used. | 
| 
 | Enable verbose mode to display more messages. | 
| 
 | By default, operations (e.g., launch) are bound to proper timeouts; disable timeouts if some applications can take a long time to perform operations. Note: This argument is not used for simply launching the target application. | 
| 
 | Run the CLI as a command line launcher; the CLI exits after launching the target application. You may attach to the application with the Nsight Host after it has been launched. | 
There are also activity-specific options beyond the general options above. For examples on launching with a specific activity, as well as referencing these activity specific options, see the follow sections:
- Generate C++ Capture activity options (see Generate C++ Capture from a Command Line) 
- GPU Trace activity options (see Generate GPU Trace Capture from a Command Line) 
If you wish to simply launch an application, without automatically performing a capture, the CLI can launch an application with commands in the form of the below command.
ngfx.exe --launch-detached [general_options]
Examples:
- ngfx.exe --activity="Frame Debugger" --platform="Windows" --project="D:\Projects\Bloom.ngfx-proj" --launch-detached - Launch an application in the local host, with using the launch options and activity options read from an Nsight Graphics project. 
- ngfx.exe --activity="Frame Debugger" --platform="Windows" --hostname=192.168.1.10 --exe=D:\Bloom\bloom.exe --launch-detached - Launch an application on a remote machine. 
Using Nsight Graphics With WebGL
Nsight Graphics has the ability to debug and profile WebGL applications running in a browser. However, there are some setup steps that must be taken to ensure compatiblity and provide as much API contextual information as possible. This is important because many browsers, especially on Microsoft Windows, use graphics APIs other than OpenGL for the backend rendering. Because of this, when you attempt to debug your application, you may see some DirectX calls that do not map easily to the WebGL calls you originally made. Using the correct settings can force an OpenGL backend, which looks more similar to what you are expecting and aid in debugability.
To get started, make sure you close out all browser windows. This is necessary due to the method that Nsight Graphics uses for injecting code into your application. Specifically, Chrome tends to launch a child process from one of the other, already running processes. If that initial process is not injected, then the child process is not either, even if you launched it from the Nsight Graphics Connection dialog. So, close out any browser windows and use Task Manager to ensure that no Google Chrome processes (or other browser processes) remain.
Next, you should set the browser to use the OpenGL backend. In Chrome, you need to:
- Type chrome://flags/ in the address bar to bring up the settings. 
- Search the settings for OpenGL. 
- For the ANGLE graphics backend, choose OpenGL. 
Finally, browsers typically need additional settings to ensure tool compatibility. For Google Chrome, you need to specify the following additional command line options when launching the browser:
| Option | Description | 
|---|---|
| –no-sandbox | Disable some of the Chrome security checks to allow process injection to work. | 
| –disable-gpu-watchdog | Disable the Chrome GPU activity check. This allows for the application to be paused live and not have Chrome exit. | 
| –disable-features=RendererCodeIntegrity | Disables the renderer code integrity feature that interferes with graphics overrides from Nsight Graphics. | 
| –gpu-startup-dialog | Optional: This flag causes a dialog to display on launch of the graphics process, to help find the process you want to debug. Note that Nsight Graphics can typically find the process without resorting to manual intervention. | 
Automatic Cleanup of Launcher Processes
There are some processes that have the potential of interfering with the launch of an analysis session of your application. These processes are typically long-lived application launchers — they often perform a coordinated launch between a child process and the parent launcher. In some cases, this coordinated launch can interfere the process by which Nsight Graphics injects itself with the approach analysis settings.
To mitigate this problem, Nsight Graphics attempts to detect processes that are known to interfere and to offer you an opportunity (dialog shown below) to terminate the processes before launch, thereby allowing a launch without interference.
The buttons on the dialog perform the following actions:
- Yes — terminate the processes and continue launching. 
- No — do not terminate the processes, but continue to launch the application. 
- Abort — cancel the launch entirely. 
To edit the list of processes that are detected, add an entry to the list in Tools > Options > Injection.
After a Process is Connected
After a process is connected, it is ready to be analyzed. For many activities, a default set of windows will come up that offer an impactful set of tools for analysis that pertains to the activity. You can also add additional windows to the application by selecting a view from the menu bar. See the User Interface Reference for a detailed discussion of each view and tool window.
For the Frame Debugger activity that was started above, there are both live analysis and capture utilities. When capturing from this activity, done though the Target application capture hotkey or “Capture for Live Analysis,” a number of views will open. On the target application, the HUD will appear with the toolbar and scrubber. This UI allows you to view an exhaustive amount of information on the state, resources, and synchronization of your application.
 
 
With such an expansive set of information available, debugging a rendering problem is made easier.
Choose Window To Debug
If your application has multiple windows, each with a graphics context, you can choose the window that you want to focus your debugging and profiling efforts on. For instance, if your application has 3 different windows, each with its own view of the scene, you can easily select the window of interest to capture. When multiple windows are detected, there will be a new “Select Window” control on the top toolbar. The combo box to the right will be populated with the detected windows, including the window name and size to aid in determing the correct window.
 
Any time that you are not currently in an active Capture/Replay, you can use the control to specify the window to capture. If you select “Default,” the most recent window to present the back buffer will be selected as the window to capture. This option is available with the Frame Debugger and Generate C++ Capture activities.
Target application capture hotkey
Activities support triggering capture from the Nsight Graphics UI or directly from the target application. The default capture hotkey is F11. This may be configured also in Tools > Options.
 
Configuring Your System for Optimal Analysis
In order for your system to work well with the analysis tools provided by Nsight Graphics, there are a number of details you should consider.
Configure Developer Mode
If using a Windows machine, we recommend running under Developer Mode.
Reasons to use Developer Mode include:
- Preview versions of the D3D12 Agility SDK require Developer Mode. 
- Some applications can only replay with the D3D12 preview Agility SDK. 
Please follow the instructions at https://learn.microsoft.com/en-us/windows/apps/get-started/developer-mode-features-and-debugging for setting up Developer Mode.
Configuring Your Application for Optimal Analysis
In order for your application to work well with the analysis tools provided by Nsight Graphics, there are a number of details you should consider when configuring your application.
Using Performance Markers
Performance markers are integral to nearly all workflows. We recommend that your application always run with perf markers when running under tools analysis.
Performance markers are most commonly used to delineate sections of events and note where in your application they begin and end. They can also be nested to show sub-sections of events. Perf markers are generally used to measure the amount of time that an inner portion of algorithm takes.
There are multiple different types of perf markers that are supported in Nsight Graphics:
- D3D9 perf markers are supported for all D3D applications. 
- ID3DUserDefinedAnnotationmay be used for D3D11 or D3D12 applications. See ID3DUserDefinedAnnotation interface on MSDN.
- Perf markers made available by Microsoft’s - PIXBeginEvent/PixEndEventAPIs are supported for D3D12. See https://devblogs.microsoft.com/pix/winpixeventruntime/.
- Vulkan applications may use either - VK_EXT_debug_utilsor- VK_EXT_debug_marker.
- OpenGL applications use the - KHR_debuggroup,- glPushDebugGroupand- glPopDebugGroup.
Shader Compilation
Nsight Graphics works best when you have the full shader source available for debugging. Follow the steps below to set up your application for optimal configuration.
D3D Configuration
Nsight Graphics works best with access to the original HLSL source code of your shaders. There are a few ways to accomplish this task. The first is to precompile the shaders into binary format using the using one of the legacy D3DCompile functions or the latest IDxcCompiler interfaces and saving the results out to a file.
Alternatively, you can use the offline compiler, fxc.exe or dxc.exe, provided by the DirectX SDK.
For each of these methods, you need to specify some flags in order for the HLSL debug information to be embedded in the binary output, outlined below:
| Compile type | Required actions | 
|---|---|
| Shaders compiled offline via dxc.exe or fxc.exe | Add  For dynamic shader editing, you can retrieve a minimal set of debug info using the  | 
| Shaders compiled online via IDxcCompiler3::Compile | Add  For dynamic shader editing, you can retrieve a minimal set of debug info using the  Note that earlier versions of the Compile function, including  | 
| Shaders compiled online via D3DCompile, D3DCompile2, D3DCompileFromFile | Add the D3DCOMPILE_DEBUG flag to the Flags1 parameter. | 
Nsight Graphics also supports reading debug info from files that have been generated using the dxc.exe -Fd option. To load these external files, you need to set the appropriate path(s) in the Compiled Shader Symbol Paths section of the Search Paths.
Vulkan Configuration
Nsight Graphics works best with access to the original high-level source code of your shaders. To accomplish this, shaders need to be compiled with debug information in order for the original high-level source code to be embedded in the SPIR-V binary modules.
When using the glslangValidator tool, add -g flag to the shader compilation command line. For example:
glslangValidator -V shader.vert -o shader.spv -g
When using the dxc tool, add -Zi flag to the shader compilation command line. For example:
dxc -spirv -T ps_6_5 -E PSMain shader.frag -Fo shader.spv -Zi
Function Debug Information
To enable function debug information in SPIRV, which is the dependency of Flame Graph, Top-Down Calls and Bottom-Up Calls, we need to add the argument -gVS (instead of -g) for glslangValidator or -fspv-debug=vulkan-with-source for dxc, to enable the SPIRV NonSemantic Shader DebugInfo extension.
Note
There’s a known issue for glslangValidator that all DebugFunction instructions are referencing one same file. This leads to wrong correlation from function name to source line numbers. This bug is fixed after glslangValidator 15.0.0.
Naming Objects and Threads
Many of Nsight Graphics’ views and analysis benefits from naming API objects and threads. Similar to perf markers, these names can help offer increased context for your analysis. The tables below list the supported methods for naming objects and threads.
Naming Objects
- D3D11
- No programmatic method; use Nsight-generated names 
- OpenGL
| Platform | Method | 
|---|---|
| Windows | |
| Linux | Not yet supported | 
How To Setup and Inspect GPU Crash Dumps
This section describes how to use the NVIDIA Nsight Aftermath Monitor to generate GPU crash dumps for applications using the Direct3D 12 or Vulkan API, and how to open and inspect those GPU crash dumps with the crash dump inspector plug-in in Nsight Graphics.
Alternatively, developers can also add GPU crash dump collection support into their graphics application using the NVIDIA Nsight Aftermath SDK.
Workflow
The general workflow for working with Nsight Aftermath GPU crash dumps is to:
- Run the NVIDIA Nsight Aftermath GPU Crash Dump Monitor. 
- Configure the desired GPU crash dump features. 
- Optional: if you want to collect additional information via event markers, you can optionally instrument the graphics application using the Nsight Aftermath SDK. 
- Run the graphics application for which to capture GPU crash dumps and reproduce the GPU crash or hang, allowing the monitor to collect the GPU crash dump. 
- Open the GPU crash dump in Nsight Graphics. 
- Configure GPU Crash Dump Inspector settings. 
- Inspect the crash dump data using the Nsight Graphics crash dump inspector. 
See the sections below for details on each step of this process.
The GPU Crash Dump Monitor
The NVIDIA Nsight Aftermath Crash Dump Monitor provides the means to capture GPU crash dump files for GPU crashes or GPU hangs, and to modify the driver configuration settings related to crash dump generation.
Running the GPU Crash Dump Monitor
The NVIDIA Nsight Aftermath Monitor nv-aftermath-monitor.exe is installed to the Nsight Graphics host directory. Typically this is:
<install directory>\host\windows-desktop-nomad-x64
By default, the crash dump monitor application starts in the background. Its user interface is accessible through the NVIDA Nsight Aftermath Monitor icon in the Microsoft Windows system notification area (system tray).
Configuring the GPU Crash Dump Monitor
All configuration options related to GPU crash dump creation are available through the GPU Crash Dump Monitor Settings dialog.
- Set up the directory where crash dump files are stored. 
- Set up the directory where shader debug information files are stored. 
- Enable Aftermath GPU Crash Dump collection. Either set Aftermath mode to Global to enable crash dumps for all applications using the D3D12 or Vulkan API, or selectively enable it for one or more applications by managing an application Whitelist. 
- Enable the desired Aftermath graphics driver features: - Generate Shader Debug Information to generate shader debug information (line tables for mapping from the shader IL passed to the NVIDIA graphics driver to the shader microcode executed by the GPU) for all shaders loaded by the application. - The GPU Crash Dump Monitor stores the debug information into files with the - .nvdbgextension in the Debug Info Dump Directory directory configured in the General Settings Tab of the GPU Crash Dump Monitor Settings.- The shader debug information is required for mapping shader microcode instructions of active or faulted shader warps to shader IL or shader source lines. Shader debug information is identified by a unique shader debug information identifier embedded into the crash dump file. - See also the section about Source Shader Debug Information for details on how to compile shader source with source-level debug information. 
- Enable Resource Tracking to enable additional driver-side tracking of live and recently destroyed resources. - This allows Nsight Aftermath to identify resources related to GPU virtual addresses seen in the case of a crash due to a GPU page fault. The resource information being tracked includes details about the size of the resource, its format, and the current deletion status of the resource object. D3D12 developers may also consider instrumenting the application using the - GFSDK_Aftermath_DX12_RegisterResourcefunction to register the D3D12 resources the application creates. That allows Nsight Aftermath to track additional information, such as the resources debug names set by the application. For Vulkan applications, the resources debug names set via- vkSetDebugUtilsObjectNameEXTis captured too. For more details on how instrument and application with D3D12 resource tracking, see the Nsight Aftermath SDK documentation.
- Enable Call Stack Capturing to enable automatic generation of Aftermath event markers for tracking the origin of all draw calls, compute and ray tracing dispatches, ray tracing acceleration structure build operations, or resource copies initiated by the application. - The automatic event markers are added into the command stream right after the corresponding commands with the CPU call stacks of the functions recording the commands as the data payloads. - Note - Enabling this feature causes considerable driver overhead for gathering the necessary information. - Note - When this feature is enabled, the GPU crash dump file may contain the file path for the crashing application’s executable as well as the file paths for all DLLs or DSOs it has loaded. 
- Enable Shader Error Reporting to enable a special mode that allows the GPU to report additional runtime shader errors. This may provide additional information when debugging hangs, crashes, or unexpected behavior related to shader execution. - Enabling this feature may result in additional crash dumps reporting issues in shaders that exhibit undefined behavior or have hidden bugs, which so far went unnoticed because by default the hardware silently ignores them. The additional error checks that are enabled when using this option will cause GPU exceptions for the following situations: - Accessing memory using misaligned addresses, such as reading or writing a byte address that is not a multiple of the access size. 
- Accessing memory out-of-bounds, such as reading or writing beyond the declared bounds of (group) shared or thread local memory or reading from an out-of-bounds constant buffer address. 
- Accessing a texture with incompatible format or memory layout. 
- Hitting call stack limits. - Note - This feature is only supported with NVIDIA graphics driver R515 or later. 
 - Note - On Windows, modifying the Nsight Aftermath graphics driver settings requires Windows Administrator privileges. Therefore, when any of these settings are modified and applied, a User Account Control confirmation window may pop up asking for permission to modify system settings. 
 
- Enable the desired Nsight Aftermath system-wide features: - Enable SM Register Data Collection to collect SM register values when faults happen inside SMs. This can provide additional information when debugging GPU crashes related to shader execution. - Since this is a system-wide setting, modifying it might also affect other tools such as Nsight VSE CUDA debugger and may result in unexpected behavior. On Linux, this feature is always enabled without incompability with other tools. - Note - This feature is only supported for the D3D12 and Vulkan APIs with NVIDIA graphics driver R535 or later and requires Nsight Graphics Pro to visualize the data. Starting with the R550 driver series, the SM register data collection feature is enabled by default and the setting is no longer available. 
 - Note - On Windows, modifying the Nsight Aftermath system settings requires Windows Administrator privileges. Therefore, when any of these settings are modified and applied, a User Account Control confirmation window may pop up asking for permission to modify system settings. 
The GPU Crash Dump Inspector
The NVIDIA Nsight Aftermath Crash Dump Inspector provides the means to open, inspect, and analyze GPU crash dump files created by the NVIDIA Nsight Aftermath Monitor or the Nsight Aftermath SDK.
Loading GPU Crash Dump Files
GPU crash dump files use the .nv-gpudmp file extension and can be loaded through File > Open File... This will bring up a GPU Crash Dump Inspector window displaying the crash dump file’s content.
Configuring the GPU Crash Dump Inspector
In order to use all functionality provided by the GPU crash dump inspector, the following configuration settings should be made in the Search Paths Settings.
- Add the directories where binary shader files (DXIL or SPIR-V shader files) are stored to Shader Binaries. If the binary shaders cannot be found, the Shader View is not able to display intermediate shader assembly code or shader source code. - For more information on how to generate these files, see Source Shader Debug Information. 
- Add the directories where the separate shader debug information files ( - .lldor- .pdbfiles generated by- dxc.exefor instance) are stored to Separate Shader Debug Information. If the shader debug information cannot be found, the Shader View is not able to map GPU PC addresses of active or faulted warps to intermediate shader assembly or shader source code locations.- For more information on how to generate these files, see Source Shader Debug Information. 
- Add the directories where the NVIDIA shader debug information files generated by the GPU crash dump monitor are stored to NVIDIA Shader Debug Information. If the NVIDIA shader debug information cannot be found, the Shader View is not able to map GPU PC addresses of active or faulted warps to intermediate shader assembly or shader source code locations. 
- Optionally, add the directories where shader source files are stored to Shader Source. Usually, this is not required as the shader debug information already includes the shader source. Only if a shader was compiled from source that contains references to other source files, for example via - #linedirectives, may it be necessary to specify additional source directories so that the Shader View can find the correct shader source.
- Add the directories where to find the symbol files for the graphics application for which the GPU crash dump has been captured to Application Debug Information. This allows the Aftermath Marker Call Stack View to resolve addresses to functions and source locations. 
Inspecting GPU Crash Dump Files
Use the GPU Crash Dump Inspector to analyze crash reasons. This is not an exhaustive tutorial on how to analyze GPU crash dumps, because every crash or hang is different, but it should provide some hints to get started.
After loading a crash dump file, it is usually a good start to check the Exception Summary on the Dump Info tab. This shows a high-level fault reason, e.g., whether the graphics device was hung or an error like a page fault has occurred. If there was a page fault or shader fault, this section contains an analysis that mentions potential causes and provides links to relevant information in the Crash Info tab and Shader View.
In case of a hang, it makes sense to check if there is an Active Warps section on the Dump Info tab showing shader activity. This could point toward an issue with very long-running shader warps or shader warps being stuck in an infinite loop. In that case, the Shader View may help to root cause the problem.
If the device state indicates there was a memory fault, the next step would be to look for a Page Fault section on the Dump Info tab. This may help to pinpoint problems with out-of-bounds resource access or accessing an already deleted resource.
If the application was instrumented with Aftermath Event markers, an Aftermath Markers section should be available on the Dump Info tab. This may help to pinpoint the draw or dispatch call that caused problems.
If Call Stack Capturing was enabled when capturing the GPU crash dump, Call Stack links should be available in the Aftermath Markers section, pointing to the draw, dispatch, or copy call that may be related to the problem.
Last, the GPU State section on the Dump Info tab may provide some hints about which parts of the graphics pipeline were active or have faulted when the crash occurred.
Instrumenting Applications with the Aftermath API
The NVIDIA Nsight Aftermath SDK provides the Aftermath API that can be used by developers to instrument their applications. The latest version can be downloaded from https://developer.nvidia.com/nsight-aftermath.
By default, the the latest version of the SDK package available at the time of a Nsight Graphics release is installed together with Nsight Graphics in:
<install directory>\SDKs\NsightAftermathSDK
Detailed information about the functionality provided by the library and how to use it in an application can be found in the Readme.md that comes with the SDK package and the header files.
Nsight Aftermath Event Markers
In D3D applications, the Aftermath event marker API (GFSDK_Aftermath_SetEventMarker) can be used to inject event markers with user-defined data directly into the graphics command stream. If the application is instrumented with event markers, information about the last event markers that were processed by the GPU for each command stream will be captured into the GPU crash dump, including the user provided event data. More information about Aftermath event markers and how to instrument an application to use them can be found in the Nsight Aftermath SDK documentation.
Note
Using event markers should be considered carefully. Injecting markers in high-frequency code paths can introduce high CPU overhead. Therefore, on some driver versions, the event marker feature is only available if the Nsight Aftermath GPU Crash Dump Monitor is running on the system. This requirement applies to R495 to R530 drivers for DX12 and R495+ drivers for DX11. No Aftermath configuration needs to be made in the Monitor. It serves only as a dongle to ensure Aftermath event markers do not impact application performance on end user systems.
Similar functionality is available for Vulkan applications with the VK_NV_device_diagnostic_checkpoints extension.
Source Shader Debug Information
For mapping shader instruction addresses for active or faulted shader warps to high-level shader source, shaders need to be compiled with debug information. Since shader compilation is a two-step process — compilation from shader source, such as HLSL or GLSL, to an intermediate shader language representation, such as DXIL or SPIR-V, and graphics driver-level compilation of the shader IL to the actual microcode executed by the GPU — there are two levels of debug information required to accomplish such a mapping. This section describes how to compile shader source code with debug information suitable for consumption by the Aftermath GPU Crash Dump Inspector using the Microsoft DirectX Shader Compiler or the Vulkan SDK toolchain for shader compilation.
The generation of shader debug information for the microcode level needs to be enabled either through the Nsight Aftermath GPU Crash Dump Monitor settings or Aftermath feature flags when using the Nsight Aftermath SDK. For more information, see the Nsight Aftermath SDK documentation.
To enable shader instruction mapping when analyzing crash dumps in Nsight Graphics, the debug information must be made available by setting the Search Path Settings as described in Configuring the GPU Crash Dump Inspector.
For D3D12, the following variants of compiling shaders with debug information using the Microsoft DirectX Shader Compiler (dxc.exe) are supported by Nsight Aftermath:
- Compile and use full shader blobs: Compile the shaders with debug information. Use the full (i.e., not stripped) shader binaries when running the application and make them accessible to Nsight Graphics when inspecting GPU crash dumps by adding the disk location where the compilation results are stored to the Shader Binaries search paths. - An example command line may look like this: - dxc -Zi [..] -Fo shader.bin shader.hlsl 
- Compile and strip: Compile the shaders with debug information, then strip off the debug information. Use the stripped shader binaries when running the application and make both stripped and not stripped files accessible to Nsight Graphics when inspecting GPU crash dumps. Add the disk location of the stripped files to the Shader Binaries search path and add the disk location of the not stripped files to the Separate Shader Debug Information search paths. - An example command line may look like this: - dxc -Zi [..] -Fo full_shader.bin shader.hlsl dxc -dumpbin -Qstrip_debug -Fo shader.bin full_shader.bin 
- Compile with separate debug information: Compile the shaders with debug information and instruct the compiler to store the debug meta data in separate shader debug information files (shader PDB files). Make both the shader binaries and the shader debug information files accessible to Nsight Graphics when inspecting GPU crash dumps. Add the disk location of the shader binaries to the Shader Binaries search path and add the disk location of the shader debug information files to the Separate Shader Debug Information search paths. - An example command line may look like this: - dxc -Zi [..] -Fo shader.bin -Fd debugInfo\ shader.hlsl 
If the application compiles shaders on-the-fly, it needs to store the shader binary blobs to disk in a similar fashion so that they are accessible to Nsight Graphics when inspecting GPU crash dumps.
Note
No IL-level or source-level shader mapping is supported for DX bytecode shaders generated by the legacy Microsoft DirectX fxc.exe shader compiler.
For Vulkan, the following variants of generating SPIR-V shader code with debug information are supported by Aftermath:
- Compilation using the glslangValidator tool of the Vulkan SDK’s shader compilation toolchain. An example command line may look like this: - glslangValidator -V -g -o shader.spv shader.vert 
- Compilation using the Microsoft DirectX Shader Compiler. An example command line may look like this: - dxc -spirv -Zi [..] -Fo shader.spv shader.hlsl 
Use the full (i.e., not stripped) SPIR-V shader binaries when running the application and make them accessible to Nsight Graphics when inspecting GPU crash dumps by adding the disk location where they are stored to the Shader Binaries search paths.
Note
No source-level shader mapping is supported for pairs of stripped and not stripped SPIR-V files. Users interested in shader source mapping for applications shipping with stripped SPIR-V shaders may use the GPU crash dump decoding functionality provided by the Nsight Aftermath SDK and implement their own crash dump decoding tool.
Nsight Aftermath Shader Hashes
There is no naming convention for shader files and developers can freely decide what file name and extension they use to store their DirectX shader binaries, separately stored “pdb” files, or SPIR-V shader files. Furthermore, the graphics drivers have no knowledge about those files. Therefore, Nsight Aftermath uses shader code hashes to identify the shader binaries loaded by the application.
When searching for the necessary information for showing DXIL/SPIR-V instructions or for source mapping information, the shader binaries found in the configured Shader Binaries search paths are compared against those hashes.
For developers who want to calculate the hashes for their files, the Nsight Aftermath SDK provides two APIs:
- For D3D12/DXIL shaders, use the - GFSDK_Aftermath_GetShaderHashfunction.
- For Vulkan/SPIR-V shaders, use the - GFSDK_Aftermath_GetShaderHashSpirvfunction.
Both functions and additional information can be found in the GFSDK_Aftermath_GpuCrashDumpDecoding.h header file or the Nsight Aftermath SDK documentation.
Nsight Aftermath Shader Debug Information Identifiers
Nsight Aftermath uses unique identifiers to identify the low-level debug shader debug information the NVIDIA D3D12 or Vulkan driver generates for mapping shader microcode instructions to DXIL or SPIR-V instructions.
When searching for the necessary information for mapping microcode instructions to DXIL or SPIR-V instructions, the debug information files found in the configured NVIDIA Shader Debug Information search paths are compared against the shader debug information identifier.
The Nsight Aftermath GPU Crash Dump Monitor uses the shader debug information identifier to generate a unique base name and the .nvdbg extension for the debug information files it creates, e.g., A9B36BBAFFD79B51-000001BB689D5060-*.nvdbg. Developers using the Nsight Aftermath SDK can freely choose the naming convention for the files being created for the NVIDIA debug information retrieved by the application via the GFSDK_Aftermath_ShaderDebugInfoCb callback. However, you are encouraged to also include the shader debug information identifier in the file name convention you use. This may help to understand why debug information may not be found with the current search path settings.
The Nsight Aftermath SDK provides the GFSDK_Aftermath_GetShaderDebugInfoIdentifier API that can be used to calculate the shader debug information identifier for a memory buffer containing shader debug information data.
This function and additional information can be found in the GFSDK_Aftermath_GpuCrashDumpDecoding.h header file or the Nsight Aftermath SDK documentation.
Shader Debugger Setup
Note
The Shader Debugger requires to be run as Administrator on Windows or superuser (e.g. via sudo) on Linux. When running locally, this means the Nsight Graphics host must be run with these privileges, and when running remotely the Nsight Graphics Remote Monitor must have them. This is a requirement that we are working to remove in a future version.
Application Setup
Before trying to run the Shader Debugger, you want to make sure your application is compatible. One of the engine features that can interfere is any “watchdog” timers that monitor threads to ensure they are making progress. When stopped at a breakpoint, the rendering thread does not make any progress, and the watchdog may try and close the application. This can leave the GPU hung and require a hard reboot to restore functionality. We are looking at ways to make this easier to use, but similar to Frame Debugging, it is best to disable any watchdog timers.
Shader Setup
For any shader to be debugged at a source level, it must be compiled with debug information. Non-Semantic Debug Information is preferred, but not required. See Shader Compilation for more information on how to generate shaders with debug information.
A shader will undergo compilation multiple times in its lifetime. First, from high-level source (such as GLSL, HLSL, or Slang) to an intermediate binary format (e.g., SPIR-V), then again by the NVIDIA driver from the intermediate binary format to microcode that is executed by the GPU. A debug compilation of microcode depends on debug information included-in/along-side the intermediate binary. The Shader Debugger provides multiple modes to help facilitate if/when the shader is compiled as debug by the NVIDIA driver.
Shaders may be compiled to an intermediate binary format with optimizations, but the user should be aware that instruction step ordering, or performance may be impacted. It is advised to compile shaders with optimizations disabled for this reason. However, the Shader Debugger activity supports a JIT (or Just-In-Time) replacement mode which will allow the user to recompile the shader sources before replacement on a driver level. At this point, shaders may be compiled to skip any optimization steps that occured in the original compilation.
System Setup
The Shader Debugger is supported in two different machine configurations: local debugging with two GPUs on one system or remote debugging using two systems, each with a single GPU. This is necessary because the hardware does not support instruction level preemption for non-CUDA workloads, so when a shader hits a breakpoint on the GPU, it stops all rendering, including the desktop.
Single System Debugging
In order to run the Shader Debugger, you need to have two or more GPUs installed in your system. One GPU must be reserved for the OS desktop environment and other GPU accelerated applications, including the Nsight Graphics host. This can be any GPU, NVIDIA or other vendor, so long as you can install an up-to-date driver on it. The GPU running your application must be at least NVIDIA’s Ampere microarchitecture or newer. The Nsight Graphics host eases the burden of GPU selection by allowing you to select which GPU you intend to assign as a host GPU, and which as the target GPU. Applications launched with the Shader Debugger will restrict the underlying graphics API (Vulkan) to only return the selected target GPU when enumerating devices.
Note
Nsight Graphics will not manage any other applications, including the Nsight Graphics host, from running on your selected target GPU.
If you run into problems determining what GPU your application or the Nsight Graphics host is running on, you can use nvidia-smi from a command prompt to display what GPU a given process is running on.
 
The image above shows an example of a correct setup when looking at the nvidia-smi command output: Nsight Graphics (ngfx_ui.exe) is running on the same GPU as all other processes, including ShellHost.exe, while the target application, vulkansponza.exe in this case, is running by itself on the GPU being used for debugging.
Windows Setup
When configuring for single system, multi-GPU debugging, it is strongly advised that your target GPU is headless, or disabled for use by the OS desktop environment. Windows is known to become unstable when interacting with the desktop of a GPU that has been suspended.
When your system is configured with two or more NVIDIA discrete GPUs, and both have been selected as the host and target GPUs, Nsight Graphics will automatically configure your system to prepare for multi-GPU debugging at launch time. See Automatic Configuration (Windows Only) for more information.
As for the Nsight Graphics host, there are some features like the Focus Picker that require an OpenGL context. If machine settings are not set appropriately, the OpenGL context may utilize the same GPU that your application uses, causing the Nsight Graphics UI to freeze while stopped at a breakpoint. If you experience a freeze, You can confirm the GPU selected by Nsight Graphics by using the nvidia-smi tool mentioned above.
In order to change the GPU that the host is running on, you can use the Windows Settings applet, then select Display and Graphics.
 
Click the Browse button to select the ngfx-ui.exe host application based on where you installed it, and you have the option to select the GPU to run on.
 
When you have a system with 2 discrete NVIDIA GPUs, the Windows settings can sometimes fail to make the Nsight Graphics host run on the GPU you have selected (as shown via the nvidia-smi command documented above). One example of a view that uses graphics is the Focus Picker, and the graphics context does not initialize until the view is opened. In this case, you can use the NVIDIA Control Panel to try and force the correct GPU to be used. Under the 3D Settings branch, select Manage 3D settings. On the “Global Settings” tab, select the “OpenGL rendering GPU” entry in the settings and select the GPU you want to use for the host. Once you restart the host, you can use nvidia-smi to again check to determine if the host is running on the GPU you requested.
 
Linux Setup
In order to run on Linux, you need to run 2 separate X servers: one for each GPU. These are the steps you can use in order to do this using LightDM and Xfce
- Using apt/apt-get, install lightdm and xfce. 
- Make sure you have 2 sets of keyboard, mouse, and display attached to the system. 
- Make lightdm the default DM (this is needed for DualX setup and also for x11vnc). - ::
- sudo dpkg-reconfigure gdm3 
 
- By default all devices are attached to seat0. - Use - loginctl seat-status seat0to see all devices using seat0.
- Attach a GPU, keyboard, and mouse to seat1 (or the corresponding USB ports). - ::
- loginctl attach seat1 /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:10.0/0000:04:00.0/drm/card1 
 - ::
- loginctl attach seat1 /sys/devices/pci0000:00/0000:00:14.0/usb1/1-4 
 - ::
- loginctl attach seat1 /sys/devices/pci0000:00/0000:00:14.0/usb1/1-5 
 
- Once you have done this, you need to reboot your system. 
 
Multi-System Debugging
Setting up multi-system debugging is similar on Windows and Linux.
- You need to run the Nsight Graphics Remote Monitor (nv-nsight-remote-monitor) as Administrator (Windows) or superuser (Linux) on the system you run your graphics application on. 
- Run the Nsight Graphics host (ngfx-ui) on the second system. 
- From the host, set up the remote connection, including the machine name or IP address to connect to. 
 
- Change the Application Executable field to point to the application you want to run. Note that this and the Working Directory field are relative to the target machine, not the local machine. 
- Update any command line arguments or environment variables you may need. 
- Select the Shader Debugger activity and then press the Launch Shader Debugger button in the lower-right corner of the dialog. 
The host should automatically attach to your target application once it has finished launching.
Automatic Configuration (Windows Only)
Nsight Graphics will require one-time configuration of a target machine when first launching an application with the Shader Debugger. Some actions, such as switching target GPUs, or changing TDR settings within Nsight Graphics may require a re-configuration to occur.
The following registry changes are made by the tool:
- Configuring Windows TDR Delay Learn More. - TDR Delay must be increased on the debug target system so that the application can remain stopped at a breakpoint for an extended amount of time. The increased timeout value can cause significant delays resetting the device when GPU exceptions occur, and when trying to collect a dump via Nsight Aftermath. It is advised to reset your TdrDelay setting to its default value (2s) after debugging is complete. 
- The Nsight Graphics default configured value is 600 seconds. This value can be changed by the user within the Nsight Graphics host settings.   
 
- Enabling the GPU Debugger 
- Configuring NVIDIA multi-GPU rendering (single system debugging only) 
Samples
Nsight Graphics ships with a selection of samples that can help you understand how to use the tool. To access these samples, select Help > Samples and choose a sample to open. Upon selection, a sample project is created and the connection dialog comes up, allowing you to work with and example the sample. Some samples also include example reports for you to study.
Note that samples are also an effective tool for disambiguating Nsight Graphics problems from system problems. If the samples do not run for you without issue, there is likely a system problem that needs diagnosing.
Activities
Nsight Graphics supports multiple activities to target your workload to the need of your work at a particular point in your development process.
- Graphics Capture — offers a set of tools for intercepting a user application and capturing the application’s graphical API calls and resources, from which you may debug that application. 
- CLI Graphics Capture — CLI Tools for working with Graphics Captures. 
- Frame Debugger — allows you to debug a frame by each draw call. You can view vertex shaders, pixel shaders, and pipeline states. 
- GPU Trace Profiler — Supports the analysis of GPU workloads by profiling GPU throughput and utilization with minimal overhead. This allows you to detect and analyze performance bottlenecks or areas where your application is under-utilizing the GPU. 
- Shader Debugger — allows you to debug the shaders running in your Vulkan application as they are executing on the GPU. 
- Generate C++ Capture — the C++ Capture activity allows you to export an application frame as C++ code to be compiled and run as a self-contained application for later analysis, debugging, profiling, regression testing, and edit-and-compile experimentation a frame by each draw call. You can view vertex shaders, pixel shaders, and pipeline states. 
- System Trace — connects to Nsight Systems™ to automatically populate a system trace activity on the external tool with settings from the Nsight Graphics connection dialog. 
Graphics Capture
The Graphics Capture activity offers a set of tools for intercepting a user application and capturing the application’s graphical API calls and resources. Once captured, those API calls can be replayed in a standalone manner, and can be used — alongside graphics debugging tools — to inspect events, API state, resource values, and dependencies to understand where your application might have issues.
When to Use the Graphics Capture Activity
The Graphics Capture Activity is an evolution of Nsight Graphics’s Frame Debugging tools. It offers a streamlined and persistent experience for saving application frames and debugging them at any time.
Use this activity when:
- You want to save a standalone version of your frame. 
- You have a render-accuracy issue. 
- You expect that you may have a synchronization issue. 
- You want to explore the performance of your application shaders (DX12 and Vulkan). 
The Graphics Capture activity supports D3D12 and Vulkan APIs.
Basic Workflow
To start this activity, select Graphics Capture from the connection dialog.
 
The basic workflow for the Graphics Capture activity is to capture an application into a capture file that is saved to disk. This capture file may be opened for debugging, at which time you could navigate the events, data, and resources that your application is submitting/using to identify your issue. The capture file that is saved with the activity may also be shared with others, easily allowing them to see the same application resources that you have access to.
How to Use a Capture
The Capture Document
The capture document is a record of a previous capture. It contains an image that represents the scene that was captured, as well as details about the capture in a “Capture Details” section. From this document, graphics debugging may be entered. Additionally, the capture may be connected to another activity for analysis or replayed on its own.
 
Starting Graphics Debugging
To start Graphics Debugging, open a capture document, select the “Graphics Debugger” option under the Run list, and click the “Start Graphics Debugging” button.
Once started, the Graphics Debugger enters “Offline Replay” Mode.
Replay Modes
The Graphics Debugger has two modes of inspection:
- Offline Replay 
- Live Replay 
These two modes of replay serve different purposes. Offline replay is fast to start and requires no device creation, but it does not have the ability to serve the full set of resources that Live replay mode can provide, as access to some resources may only be acquired via a full application replay.
Read more about each replay mode below.
Offline Replay
In offline replay, the capture’s resources are used in a way that doesn’t create a rendering device. Because of this, the resources that may be inspected are more limited. However, because a rendering device is not created, this mode allows the capture to be investigated on nearly any system, including those that would be unable to replay the captured application. Additionally, because a rendering device and application resources are not created, offline replay starts immediately and without delay, which may be useful for quick inspections of certain classes of data.
When in offline replay, a banner is shown to indicate information that is not available until Live Replay is entered.
Live Replay
Live Replay is a mode in which the full set of resources are available for inspection. This mode requires a user’s system to be able to replay the application in question. Note that while many attempts at portability are attempted in the Nsight Graphics binary replayer, there may be inherent requirements of the captured application that impose requirements on the end-user’s system.
Live replay is able to gather the full set of data that the Nsight Graphics Capture Debugging user interface exposes.
If a Live replay session fails while gathering data, Nsight Graphics automatically restarts the live replay session to attempt to avoid interruption in the user’s debugging session.
CLI Graphics Capture
In addition to the UI software API, Graphics Captures may be created by CLI tools. Like the GUI
tools, the CLI tools can intercept a user application and capture the application’s graphical API
calls and application resources into a form that can then be replayed in a standalone manner. The
collection of these API calls and their resources are colloquially referred to as a “capture file.”
In the Nsight Graphics tool, this capture file has the ngfx-bincap extension.
Important Features
- Capture a series of application calls and resources into a standalone replayable format. 
- Supports both single- and multi-frame capture; up to 60 frames. 
- The replayed application can be replayed in a standalone manner. A best effort is made to replay on configurations that differ from the original capture format. 
How to Capture
The ngfx-capture tool is a command line executable used to launch a target application with the Nsight Graphics capture libraries injected so that graphics capture files can be generated.
To generate a capture, first launch the target application using the ngfx-capture tool by configuring the target application’s executable path, working directory, command line arguments, and any environment variables. These options are detailed in the Capture Command Line Argument and Options section.
Once launched, ngfx-capture should produce some console output indicating the target application was successfully launched.
> ngfx-capture.exe --exe "C:\VulkanSDK\1.3.246.0\Bin\vkcube.exe"
Launching C:\VulkanSDK\1.3.246.0\Bin\vkcube.exe ...
[vkcube.exe] Connection Established: 2023-08-09 11:08:05
Additionally, upon a successful launch the target application should display the Nsight Graphics HUD if a supported graphics API is being used.
When you would like to capture, either press the capture hotkey (default F11) or click the capture button on the Nsight Graphics HUD. Alternatively, the ngfx-capture tool can be configured to automatically capture after a given timeout or number of frames have been presented.
When the capture process begins, the application briefly pauses while initial data and state are collected. The target application then resumes and the frames are captured. When the capture completes, the capture file is written to disk along with capture statistics output to the console by ngfx-capture.
[vkcube.exe] STARTING CAPTURE: 2023-08-09 11:12:19
[vkcube.exe]   Initializing Capture ......................... 100% (58ms)
[vkcube.exe]   Capture Begin ................................ 100% (177ms)
[vkcube.exe]   Capturing 1 Frame ............................ 100% (40ms)
[vkcube.exe] ENDING CAPTURE 2023-08-09 11:12:19
[vkcube.exe]   Capture End .................................. 100% (40ms)
[vkcube.exe]   Generating Screenshot ........................ 100% (16ms)
[vkcube.exe]   Encoding Object Info ......................... 100% (37ms)
[vkcube.exe]   Capture Finalize ............................. 100% (22ms)
[vkcube.exe]   Finalizing Capture File ...................... 100% (4ms)
[vkcube.exe] CAPTURE STATS
[vkcube.exe]   Capture Time ................................. 400ms
[vkcube.exe]   Event Count .................................. 6
[vkcube.exe]   Resource Count ............................... 32
[vkcube.exe] FILE STATS
[vkcube.exe]   File Size .................................... 158 KiB
[vkcube.exe]   Data Chunk Count ............................. 15
[vkcube.exe]   File Write Speed ............................. 277.424 MiB/s
[vkcube.exe] COMPRESSION STATS
[vkcube.exe]   Compression Mode ............................. Normal (LZ4)
[vkcube.exe]   Uncompressed Size ............................ 2.67 MiB
[vkcube.exe]   Compression Speed ............................ 176.997 MiB/s
[vkcube.exe]   Compression Ratio ............................ 5.794%
[vkcube.exe] Saved to C:\Users\soandso\Documents\NVIDIA Nsight Graphics\GraphicsCaptures\vkcube_2023_08_09_11_12_19.ngfx-capture
By default the capture file is saved to ${MY_DOCUMENTS}\NVIDIA Nsight Graphics\GraphicsCaptures where the filename will be the target process name with a timestamp appended. This location and output file name can also be configured via the launch time options.
Capture Command Line Argument and Options
For a detailed list of the command line options, pass -h or --help to the ngfx-capture
executable.
NVIDIA Nsight Graphics Capture CLI Tool
Usage: ngfx-capture [OPTIONS]
Options:
  -h,--help                             Print this help message and exit
  --version                             Display program version information and exit
Application Launch Options:
  --working-dir,--wd TEXT               Working directory.
  --args TEXT ...                       Arguments to pass to executable.
  --env TEXT ...                        Environment variables to inject into process.
  --no-hud                              Disable the Nsight Graphics Capture HUD.
  --hud-position TEXT                   Set the initial position of the Nsight Graphics Capture HUD. May also be used to make the HUD hidden. (default; "Top Left")
  --new-console                         Create a separate, new console from the one the ngfx-capture executable is launched in.
  --terminate-after-capture             Terminate the application after capture is complete.
Capture Output Options:
  -o,--output-file TEXT                 Output capture file name
  --output-dir TEXT                     Output directory
  -n,--frame-count :UINT in [1 - 600]   The number of frames to capture.
  --bundle-replayer                     Bundle the ngfx-replay replayer and its dependencies within the capture file (default).
  --no-bundle-replayer                  Do not bundle the ngfx-replay replayer and its dependencies within the capture file.
  --non-portable                        Disable portability of this capture. This will reduce capture size and may lead to increased performance for the system on which the application was captured. The capture may not be able to be replayed on a different system, however.
  --high-compression                    Higher Compression (using LZ4_HC). Captures may be generated more slowly but with reduced disk space.
  --no-compression                      Disable Compression. Captures may be generated more quickly but use more disk space.
Capture Triggers:
  Select between manual and programmatic capture triggers. 
  [At most 1 of the following options are allowed]
  Options:
    --capture-hotkey                    Capture by Hotkey (default; F11).
    --capture-frame :>=1                Capture a specific frame (1-based). The frame number must be greater than 1.
    --capture-countdown-timer           Capture after a given countdown timer (milliseconds).
Host Visible Video Memory (HVVM) Mode:
  HVVM (also known as GPU_UPLOAD heaps) does not support coherent buffer update monitoring, nor capture/replay memory. These options provide workarounds to accommodate these limitations. 
  [At most 1 of the following options are allowed]
  Options:
    --hvvm-demote                       Demote HVVM to system memory (default).
    --hvvm-disable                      Disable HVVM memory. Allocations using HVVM will fail at runtime.
    --hvvm-manual-tracking              Enable HVVM but require manual tracking of updates via API calls.
    --hvvm-cpu-hash                     Enable HVVM and track updates via CPU hash.
Ray Tracing Options:
  --use-rtas-serialize-api              Serialization acceleration structures using the serialization API.
  --max-sbt-size UINT                   Max shader binding table deep copy size in bytes. 0 indicates unbounded size.
D3D12 Options:
  --d3d12-indirect-sbt-buffer-size UINT Configure the size in bytes of the static buffer used to unroll arguments for ExecuteIndirect(DISPATCH_RAYS).
  --d3d12-spoof-resize-buffers-success  During capture, instead of executing ResizeBuffers it will be bypassed to return S_OK.
Troubleshooting Knobs:
  --passthrough                         Launch the target application without injecting capture code. This allows the application to run normally without any capture overhead or modifications. This is useful for disambiguating bad launch arguments from other problems.
  --ignore-incompatible                 If enabled, the frame will attempt to capture despite any incompatibilities. Possible outcomes of proceeding despite an incompatibility include a crash, hang, rendering errors, or incorrect data. Use this option only when you are certain that the incompatibility will not impact your analysis.
  --no-lazy-data-collection             Disables lazy collection of resource data and instead capture all data at the start of capture.
  --block-on-first-incompatibility      If enabled, a blocking message box will report the first incompatibility encountered by the application.
  --no-block-on-first-incompatibility   Disable blocking incompatibility warnings.
  --no-block-on-interfering-application Disable blocking on interfering application warnings.
  --no-internal-pipeline-caches         Disables the internal usage of pipeline caches.
  --no-uncached-memory-demotion         Disables demoting uncached write-combined memory into cached memory. Cached memory allows for increased capture performance but may impact application GPU performance if write-combined memory is heavily used.
  --no-streamline-capture               Disable the streamline informational capture feature that shows streamline calls as comments in the event list
  --no-vulkan-write-watch-memory        Disable the use of write watch to track host visible memory updates.
  --no-vulkan-capture-replay-memory     Disable overriding device memory allocation flags with VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT and VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT. This is necessary if the application is later binding addressable buffers but incorrectly excluded the flags on the associated memory.
  --no-vulkan-private-data-lookups      Disable internal usage of private data objects through VK_EXT_private_data.
Troubleshooting Knobs:
  Resource Data Options
  Troubleshooting Knobs:
    --capture-full-gpu-allocs           Capture full memory allocations (i.e. ID3D12Heap or VkDeviceMemory) as opposed to individual resources (i.e. ID3D12Resource or VkBuffer). This will increase the memory consumption and overhead of capture, but may be necessary for addressing issues where applications read outside the bounds of their defined buffers. This will also reduce the portability of the capture.
Launch Type:
  Launch type. 
  [Exactly 1 of the following options is required]
  Application Launch Options:
    -e,--exe TEXT                       Executable path.
To learn more about Nsight Graphics's ngfx-capture utility, see the documentation at
http://devtools.nvidia.com/docs/Staging/devtools/Dev/Grfx/nsight-graphics/public/UserGuide/index.html#cli-graphics-capture.
For full documentation, see http://https://docs.nvidia.com/nsight-graphics/index.html.
How to replay
The ngfx-replay tool is a command line executable used to launch a replay of a graphics capture, such as one created with the ngfx-capture command line executable.
To replay a capture such as one created by ngfx-capture, invoke ngfx-replay using the capture file path as well as any desired options. These options are detailed in the Replay Command Line Argument and Options section.
Once launched, ngfx-replay should display information about the capture process, display replay initialization steps, and finally produce some console output indicating the replay loop has begun.
> ngfx-replay.exe GravityMark_2023_08_16_08_24_52.ngfx-capture
NVIDIA Nsight Graphics Capture Replayer
Loading File ................................... 100% (20ms)
Capture Information:
> Process: MyGame
> Command Line: "MyGame.exe" -d3d12
> Time: 2023_08_16_08_24_52
> Nsight Version: 2023.4.0
> Operating System: Windows 11 (21H2)
> Primary GPU: NVIDIA GeForce RTX 3090
> Driver Vendor: NVIDIA
> Driver Version: 536.52
> Primary API Version: D3D12
Initializing Function Stream ................... 100% (2ms)
Creating Resources ............................. 100% (291ms)
Initializing Resources ......................... 100% (1145ms)
Decoding Function Stream ....................... 100% (1ms)
Function Stream Optimization ................... 100% (0ms)
Function Stream Pre-Pass ....................... 100% (0ms)
Initializing Resource Reset .................... 100% (259ms)
Initializing Execution Engine .................. 100% (2ms)
Replaying Capture:
>  141.19 FPS (Frame:  7.08 ms, Reset:  1.37 ms)
>  144.24 FPS (Frame:  6.93 ms, Reset:  1.37 ms)
Replay Expectations
Capture files contain a trace of all API calls that occurred between capture begin and capture end. In addition, they contain graphics data for mid-trace host-visible buffer updates, and enough information to recreate all objects used in the API function trace. During replay, the API calls and host-visible buffer updates are replayed in the order they occurred. By default, this replay occurs serially and as fast as possible. Additionally, and by default, command list or command buffer records that occurred during the frame is multithreaded. The result is that graphics replays are generally expected to be as or more GPU bound than the original application. See Replay Command Line Argument and Options for variations on replay multithreading etc.
To correctly replay successive iterations of the replay loop, by default it is assumed that buffers, textures, as well as other objects need to reset their data before proceeding. This is typically not cheap from a wall-clock time perspective. To assist you in understanding the cost of the captured workload, the command-line output of the tool separates out the frame cost from reset cost. Typically, external FPS tools or profilers do not distinguish between the frame cost and reset cost of these captures, however the replayer also injects API-specific markers to help distinguish the frame cost vs. the reset cost. Reset behavior is configurable in options: Replay Command Line Argument and Options.
It should also be noted that after both the frame workload and the reset workload, a wait-for-idle operation is performed. This means that all pending work on all GPU queues is completed before continuing. This ensures that the replayer doesn’t cause conflicts between the frame and reset. It also seems preferable that the frame and reset work do not overlap for users wishing to profile the GPU activity.
Replay Compatibility
Generally, the replayer strives to be backwards compatible — that is, a capture file from from a previous version of the tool should continue to work on the latest replayer. Note that the reverse is not true: newer captures may contain data that an older replayer does not know how to support.
It is also assumed that a capture from a particular GPU, driver and OS should continue to replay on that GPU, driver and OS. The product strives to replay more broadly as well, i.e., on varying GPUs and drivers, but this portability is partly subject to the specific options used during capture. Consult the capture documentation for more info.
Finally, there are some conditions that make correct replay difficult or impossible: for instance, various alignments may differ especially between GPU vendors. The replayer makes a best effort to detect these conditions and exit gracefully.
Replay Command Line Argument and Options
For a detailed list of the command line options, pass -h or --help to the ngfx-replay
executable.
NVIDIA Nsight Graphics Replayer CLI Tool
Usage: ngfx-replay [OPTIONS] filename
Positionals:
  filename TEXT REQUIRED                Graphics capture file
Options:
  -h,--help                             Print this help message and exit
  --version                             Display program version information and exit
  --quiet                               Quiet all console logging (verbosity level 0)
  -v,--verbose                          Verbose console logging (verbosity level 3)
  --verbosity-level UINT                Console logging verbosity level
  --no-seh                              Suppress catching and processing Win32 structured exceptions
Application Replay Options:
  -n,--loop-count UINT:>=1              Replay the specified number of loops
  --perf-report-dir TEXT                Collect replay performance information to the specified dir
  --fixed-timestamps                    Replay multi-frame captures no faster than the FPS they were captured at
  --temp-resource-dir TEXT              Create temporary resources files in the specified dir (default=system temp dir)
  --no-multithreaded-record             Suppress multithreaded record of queue work
  --no-multithreaded-pipeline-create    Suppress multithreaded creation of all pipelines
  --no-multithreaded-rt-pipeline-create Suppress multithreaded creation of ray tracing pipelines
  --no-multithreaded-init               Suppress multithreaded resource initialization
  --max-worker-threads UINT             Max number of worker threads used for initialization, command recording, and reset
  --reset                               Reset all dirty object data in replay loop (default)
  --no-reset                            Skip resetting all object data in replay loop
  --reset-force-all-regions             Reset all regions of objects even if they are not considered to be dirty
  --skip-mapped-memory-and-descriptor-updates-after-iteration-zero
                                        Skip CPU data and descriptor updates after iteration zero
  --max-vidmem-bytes-reset-allocation UINT
                                        Max bytes of local memory allowable for replayer's internal reset buffers, used for buffers, textures, heaps, device memory.  If exceeded the replayer will spill over to sysmem
  --no-object-reset-uid UINT ...        Skip resetting specific object data in replay loop (specified by uid)
  --no-initialized-in-frame-detection   Skip checks to detect if an object is fully initialized in frame and therefore can skip reset
  --no-internal-perf-markers            Hide Nsight internal perf marker usage (e.g. reset, blit-on-present, etc.)
  --inject-full-frame-perf-marker       Add an perf marker to wrap the non-internal frame work
  --no-sysmem-fallback                  Disable falling back to sysmem when video memory allocations fail for replay resources
Presentation Mode:
  Strategy for present to screen 
  [At most 1 of the following options are allowed]
  Options:
    --present-wb                        Force borderless window mode (default)
    --present-app                       Use application presentation mode
    --present-hidden                    Hide replay window
VSync Mode:
  Strategy for controlling vsync 
  [At most 1 of the following options are allowed]
  Options:
    --vsync-app                         Use application vsync mode (default)
    --vsync-off                         Force vsync off
    --vsync-on                          Force vsync on
Device Selection:
  Explicit override options of GPU device for replay.  If none are specified the best match for the capture device will be used. 
  [At most 1 of the following options are allowed]
  Device Selection:
    --device-name TEXT                  Select device by name regex
    --device-vendor TEXT                Select device by vendor name regex
    --device-index UINT                 Select device by system defined index
Replayer Bundling:
  --bundle-replayer                     Replay capture via bundled version of ngfx-replay executable and with bundled resources
  --bundle-replayer-no-rename Needs: --bundle-replayer
                                        Do not rename the ngfx-replay executable when replaying from bundle
  --bundle-replayer-dir TEXT Needs: --bundle-replayer
                                        Extract the bundle to the specified directory, creating the directory if it does not exist. By default, a temporary directory is used and cleaned after replay.
  --bundle-replayer-extract-only Needs: --bundle-replayer
                                        Extracts the replayer without issuing a replay
Metadata Output:
  Options to output some type of metadata and exit, rather than replaying 
  [At most 1 of the following options are allowed]
  Options:
    --metadata                          Print metadata and exit
    --metadata-screenshot TEXT          Save metadata screenshot (final present embedded in capture) to path and exit (*.png|tga|bmp|jpg supported)
    --metadata-functions                Print function stream and exit
Multibuffer Options:
  Multibuffering options 
  [At most 1 of the following options are allowed]
  Application Replay Options:
    --multibuffer                       Enable multi-buffering of the recording, syncs, descriptors, and memory to potentially minimize reset cost
    --multibuffer-record-and-sync       Enable multi-buffering of recording and syncs to potentially minimize reset cost
Troubleshooting Knobs:
  --no-internal-pipeline-caches         Disable internal usage of pipeline caches
  --no-pipeline-caches                  Disable all usage of pipeline caches
  --force-dx12-agility-preview          Force the usage of the DX12 preview agility runtime
  --force-trace-rays-dimensions-to-zero Force the trace rays dimensions to zero. This is useful for debugging purposes.
  --no-memory-mapped-file               Disable memory mapped file reader
  --no-aftermath-replay                 Disable replay of Aftermath calls from the original application
  --no-ngx-replay                       Disable initialization and replay of the NGX API calls from the original application
  --no-dstorage-replay                  Disable initialization and replay of DirectStorage API calls from the original application
  --no-crash-reporting                  Disable crash reporting
  --no-stack-in-crash-reporting         Disable showing the callstack in the crash report
  --no-block-on-incompatibility         Do not show popup for replay incompatibilities; continue automatically
  --no-bundled-dlss-plugins             Disable using DLSS plugins from the captured application.  Instead the plugins deployed by the replayer will be used
  --dlss-plugin-path TEXT               Load DLSS plugins from the specified path
To learn more about Nsight Graphics's ngfx-replay utility, see the documentation at
http://devtools.nvidia.com/docs/Staging/devtools/Dev/Grfx/nsight-graphics/public/UserGuide/index.html#cli-graphics-capture.
For full documentation, see http://https://docs.nvidia.com/nsight-graphics/index.html.
Frame Debugger
The Frame Debugger activity allows for:
- Real-time examination of rendering calls; 
- Interactive examination of GPU pipeline state, including visualization of bound textures, geometry and unordered access views; 
- Pixel History shows all operations that affect a given pixel; 
- Shader profiling to explore shader performance; 
- C++ Capture exports for offline collaboration and analysis. 
When to use the Frame Debugger Activity
The Frame Debugger activity offers a comprehensive set of tools for discovering problems with your application’s rendering or general operation. This activity enables the inspection of events, API state, resource values, and dependencies to understand where your application might have issues. Use this activity when:
- You have a render-accuracy issue. 
- You expect that you may have a synchronization issue. 
- You want to explore the performance of your application shaders (DX12 and Vulkan). 
The Frame Debugger activity supports all APIs that are generally supported by Nsight Graphics.
Basic Workflow
To start this activity, select Frame Debugger from the connection dialog.
 
The basic workflow for the Frame Debugger activity is to capture an application and then navigate the events, data, and resources that your application is submitting/using to identify your issue.
Whether you are debugging on the CPU or GPU, the first step of any debugging process is to narrow in on the set of data that you need to analyze to understand your problem. Generally, this means that you will want to scrub to a particular event of interest in either the Scrubber or the Event Viewer. Because Nsight Graphics™ shows you the rendering contribution of every draw call, looking at either the HUD or the Current Target View gives you an indication of where your rendering might be going wrong. Another alternative is to use the Pixel History experiment to automatically identify the draw calls that relate to a particular texture update.
From there, you want to use your knowledge of the graphics pipeline to try to understand what might be causing a problem. Some questions to ask yourself:
- Is this a geometry problem? If so, is it a pre-transform or post-transform problem? 
- Is this a blending problem? 
- Is this a synchronization problem? 
In some cases, there may be a combination of problems that exacerbate a given problem. Isolating the symptoms can be challenging, but an effective use of the tools can offer increased confidence that you are heading in the right direction.
GPU Trace Profiler
The GPU Trace Profiler activity runs a low-level profiler that can be used for developers to optimize application for NVIDIA Turing and above hardware. It runs on live applications and captures GPU Units’ utilization throughout frame execution. The GPU Trace report may help detect bottlenecks in the GPU Pipeline, as well as areas where your application is under-utilizing the GPU.
When to Use the GPU Trace Profiler Activity
The GPU Trace activity provides detailed performance information for various GPU Units.
Use this activity when:
- You wish to understand the GPU Units’ utilization and search for throughput bottlenecks. 
- You wish to understand how synchronization objects across queues are being executed. 
- You would like to search for opportunities where your application is under-utilizing the GPU. 
- You suspect your engine would benefit from asynchronous compute. 
The GPU Trace activity currently supports profiling applications on NVIDIA Turing architecture and above.
System Setup
The target system must be configured to allow performance metrics collection by GPU Trace. Please see documentation at this link for a guide on how to do this.
Memory Requirements
GPU Trace allocates memory for a variety of purposes to deliver highly detailed profiling reports. The total memory required for a given workload is determined by factors such as the complexity of the scene being traced, the duration of the trace, and the amount of data collected per unit of time. All of these factors are configurable from the activity window before you launch your application.
While the default settings are suitable for most workloads, GPU Trace will report if a trace exceeds the allocated memory for any category. If your system has additional memory resources available, you can increase the memory allocation for the relevant category and attempt the trace again. In situations where resources are limited, you can choose to either shorten the trace duration or reduce the amount of data collected for the same duration.
For a comprehensive breakdown of all the memory allocation categories in GPU Trace, refer to the sections below.
Timestamps
GPU Trace uses timestamps to construct a detailed timeline of GPU events which allows you to see for example when each draw or dispatch event occured and how long it took to complete. In order to record this data, GPU Trace allocates memory to record each timestamp. By default, GPU Trace allocates enough memory to store 100,000 timestamps per API device, which is plenty for most workloads. However, for cases where there was not enough timestamps, GPU Trace will report that it ran out, allowing you to increase the number of timestamps and try again.
To learn more about how timestamps are used by GPU Trace, see Timeline: Frames Data and Per-Queue Events
 
PC Samples and Metrics
Nvidia GPU’s allow GPU Trace to collect a detailed set of data with minimal overhead that can be used to understand how the GPU hardware is being utilized during the execution of the traced region. The amount of memory that is able to be allocated to record all this data has a fixed upper limit of ~4GB. Given this contraint, there are several settings available in the activity window before you launch your application that allow you to configure the usage of this memory. For example, you can collect data at a higher rate over a short period of time or alternatively you can less data over a longer period of time. A summary of the relevant settings are shown below:
- Max Duration (ms): The maximum trace duration, with the actual trace duration being the minimum of this and the ‘Limited To’ setting. 
- PM Bandwidth Limit (MB/s): The maximum amount of data per unit time given the max duration and memory constraints. 
- Warp State Samples Per PM Interval The final sampling rates are based on the number of samples per SM. 
 
To learn more about these settings and others before you launch your application and for how GPU Trace uses this data, see Basic Workflow and Timeline: Metrics Graphs
Hardware Events
Note
This feature is only available on the latest Nvidia Blackwell GPUs and provides a new Compute timline row.
Hardware events provide GPU Trace with more detailed timing information for various types of events, including more accurate compute start and stop timestamps. Enabling Hardware Events does increase the total amount of memory that needs to be allocated in order to record all the events that are collected. The specific allocation size can be adjusted under the ‘Additional GPU Settings’ section. If GPU Trace reports that it ran out of memory to record all events, you can try increasing this setting and try re-tracing.
See Compute Row for more details.
GPU Trace will report when it was not able to record all of the events due to running out of memory. As with all the other settings, you can you adjust the amount of memory allocated for hardware events and try re-tracing.
 
Shaders
In order to profile your shaders, GPU Trace needs to collect additional data from your shaders and pipeline state objects. In the activity window, under ‘Misc Settings’ the Collect Shader Pipelines and Collect External Shader Debug Info are two settings that control whether or not GPU Trace collects this additional data.
The Collect Shader Pipelines option allows GPU Trace to populate additional views with information about your PSO’s and shader source code.
Note
The Collect Shader Pipelines option is required when Real-Time Shader Profiler is enabled.
Additionally, the option Collect External Shader Debug Info allows GPU Trace to try and resolve debug information for shaders that do not contain embedded debug information. Any debug information that is found is then included in the trace report. You can also resolve missing debug info later when loading the trace report, see Search Paths to learn how.
See Shader Profiler for more information.
Basic Workflow
To start the activity, select GPU Trace from the connection dialog.
 
- Set up your application for connection (see How to Launch and Connect to Your Application for more information.). 
 
- Specify a Start After condition. This parameter defines how and when the trace is started and are all mutually exclusive. - Manual Trigger: Specifies that the trace is manually triggered by the user through the host application or the Target application trigger hotkey on the running application. 
- Frame Count: The trace automatically starts after a select number of frames have elapsed. Frame boundaries are defined based on presents. If set to 0, tracing starts on the first present. 
- Submit Count: The trace automatically starts after a select number of submits have been performed. If set to 0, tracing starts on the first submit. 
- Elapsed Time: The trace automatically starts on the first present or submit call that occurs after the specified amount of time has elapsed on the CPU since the first present or submit call. If set to 0, tracing starts once data collection is ready. 
 
- Enable Real-Time Shader Profiler if you want source-level shader performance to be revealed per ~10 usec interval in the Top-Down Calls table and other shader profiler views. Shader performance is collected via a high-speed sampling profiler in the SM hardware, which incurs no shader execution overhead at runtime, and only consumes PCIe TX bandwidth (along with perf counters). When disabled, a more detailed list of SM and L1TEX perf counters is collected. This setting is compatible with Timeline Metrics with a ‘fire’ icon. Requires ‘Collect Shader Pipelines’ to be enabled. This feature is available on NVIDIA Ampere GA10x and newer architectures. 
- Specify a Max Duration for the trace. This parameter defines maximum duration of the trace on the target application. 
- Specify a Limited To condition. This parameter specifies if the trace should be limited to other events than the maximum duration. This parameter also has impact on when the trace starts after the Start After condition has been met. - Max Frames: The trace is limited to a set number of frames in addition to the max duration. Specifying this causes the trace to contain frame delimiters on the timeline. The trace starts at the beginning of the next frame once the Start After condition has been met. 
- Max Submits: The trace is limited to a set number of submits in addition to the max duration. The trace starts on the the next submit once the Start After condition has been met. 
- None: The trace is only limited by the max duration. The trace starts on the the next submit once the Start After condition has been met. 
 
Note
GPU Trace consumes a lot of memory, especially in complex frames. You need to make sure that by collecting a large duration, there is enough memory to gather it all. See Memory Requirements to learn about how GPU Trace allocates memory and ways in which you can configure those allocations to suite your specific needs.
| Example: | Starting the trace after a manual trigger may be useful when profiling interactive applications. It makes it possible to interact with the app between traces in addition to allowing traces to be triggered at arbitrary points in time. | 
| Example: | Automatically starting the trace after a select number of submits may be useful when profiling non-interactive applications. The trace will be automatically triggered without any manual action required by the user. | 
- Launch or attach to your application. (See How to Launch and Connect to Your Application for more information.) 
- Once the application is running, various indicators will appear inside the application window, in the GPU Trace target HUD.   - Data Collection: Indicates the current state of the data collection: 
 - Initializing: The target is initializing. 
- Not Available: Tracing is unavailable, possibly due to an error, or the application is not running on an NVIDIA GPU. 
- Ready: The target is ready, waiting for the user to trigger it. 
- Waiting: The target is ready, waiting for the start conditions to be met. 
- Tracing: The target is in the process of collecting data. 
 - Independent Flip (Windows only): Indicates whether the optimal presentation path on Windows (I-Flip) is currently taken or not. I-Flip is available in full-screen-exclusive mode, or in windowed mode when the GPU+driver supports Multi-Plane Overlay (MPO). 
- Cycle Position: Indicates the current hotkey that cycles the position of the HUD. Can be used to change the position of the HUD if it hides important parts of the application. 
 - d. Background Compiles: Indicates the current state of background shader compilation in the graphics driver. If you’re attempting to profile a stable scene (unmoving camera), waiting for background compiles to reach an Inactive state will yield more stable and representative performance numbers. - Inactive: No background shader compiles are in progress. 
- Active: Background shader compilation is in progress. 
 
- If the application successfully connected, the process name will appear in the lower-right corner of the window. If you configured the trace to start after a Manual Trigger then you can collect a new trace by clicking the Collect GPU Trace button or by pressing the Target application trigger hotkey on the running application. If you configured the trace to start automatically after a given condition is met, then wait for the collection to complete. 
 
Note
GPU Trace collects all GPU activity. Therefore, it is preferred that you run the application on a remote machine and/or turn off all other applications while capturing.
Note
For best accuracy, it is recommended that you run your application in full-screen mode, wait for Independent Flip to be engaged (Windows only), and turn off V-Sync. You can turn V-Sync off from your application or set V-Sync Mode to Off in the activity dialog (this is the default option).
Note
By default, GPU Trace will lock the GPU clock to base before capturing. This methodology is recommended so consecutive reports will be comparable.
- After a trace is collected, a popup will appear that allows you to open the report and optionally terminate the application. 
 
Note
It is recommended that you close the application after collection, in order to free up your system’s memory while exploring the report.
How to Interpret a Report
When interpreting a report, reference the GPU Trace UI section for information on how to interpret each of the pieces of information that is provided. Things to consider:
- Am I GPU bound? 
- Am I using asynchronous compute? 
- Do I have opportunities for asynchronous compute? 
- What workloads are taking the most time? 
- Is my occupancy low for these workloads? 
If you determine that you have opportunities for asynchronous compute and you are not currently using (or achieving) async compute, you may want to investigate your engine to understand where or how you can achieve it.
If you determine that you have expensive workloads with low occupancy, you should analyze your shader for opportunities to reduce work or reduce register/memory usage to allow for more occupancy.
Generate GPU Trace Capture from a Command Line
To understand how to generate GPU Trace capture, start by launching the CLI with the --help-all argument. This will display all available options the CLI has.
The CLI can launch an application for generating GPU Trace capture in the form:
ngfx.exe --activity="GPU Trace Profiler" [general_options] [GPU_Trace_activity_options]
See CLI Arguments Details for the general options details, the GPU Trace activity options:
| Option | Description | 
|---|---|
| 
 | Wait arg frames (where each frame is defined as a present) before starting GPU trace. If set to 0, tracing starts immediately and does not wait for the first present to start counting. Mutually exclusive with other start options. | 
| 
 | Wait arg submits before generating GPU Trace capture. Mutually exclusive with other start options. | 
| 
 | Wait arg milliseconds after the first submit before generating a GPU Trace capture. Mutually exclusive with other start options. | 
| 
 | The trace is expected to be triggered by pressing the Target application capture hotkey on the running application. If enabled, the options about waiting in frames/seconds would be ignored. | 
| 
 | The maximum duration of the trace in milliseconds. | 
| 
 | Trace a maximum of arg frames (also limited by duration). Mutually exclusive with other limit-to options. | 
| 
 | Trace a maximum of arg submits (also limited by duration). Mutually exclusive with other limit-to options. | 
| 
 | The amount of event buffer memory (kB) to allocate per API device. | 
| 
 | The amount of memory (kb) to allocate for recording hardware events. | 
| 
 | Automatically export metrics data after generating GPU Trace capture. | 
| 
 | How many frames (1-15) to capture (may be limited by memory availability). | 
| 
 | Path to a json file that contains per architecture configuration. The file should be structured as a top-level array containing objects that specify “architecture,” “metric-set-name” or “metric-set-id”, and “multi-pass-metrics”, for example: [
    {
        "architecture": "Turing",
        "metric-set-name": "Throughput Metrics",
        "multi-pass-metrics": "true"
    },
    {
        "architecture": "Ampere GA10x",
        "metric-set-id": "0"
    }
]
NOTE: The available architectures and metric sets can be found in the help message ( | 
| 
 | Select which architecture the arch-specific options configure. NOTE: The available architectures can be found in the help message ( | 
| 
 | Select metric set index to use with the given architecture. NOTE: The available metric set indices (and the corresponding metric set names) can be found in the help message ( | 
| 
 | Select metric set name to use with the given architecture. NOTE: The available metric set names (and the corresponding metric set indices) can be found in the help message ( | 
| 
 | Enable Multi-Pass Metrics, which will collect additional data over multiple passes. | 
| 
 | Lock GPU clocks during trace, available options: - unaltered - base - boost | 
| 
 | By default, VK_PIPELINE_COMPILE_REQUIRED is forcefully returned for pipelines being created that reference shader modules through identifiers from VK_EXT_shader_module_identifier. Disabling this setting while using VK_EXT_shader_module_identifier may lead to shader source correlation being absent for shaders modules using identifiers. | 
| 
 | Forcefully disables the D3D12 debug layer, even if the application requested for it to be enabled. | 
| 
 | Forcefully enables the D3D12 driver to perform background optimizations on shaders even if it detects that the additional CPU overhead may impact the application framerate. This is analogous to calling ID3D12Device6::SetBackgroundProcessingMode with D3D12_BACKGROUND_PROCESSING_MODE_ALLOW_INTRUSIVE_MEASUREMENTS, enabling this setting will cause application calls to the API to get overridden. | 
| 
 | Disables NVTX ranges. NVTX ranges allow the application to create markers around queue submits using the NVTX API, however this may introduce additional overhead. | 
| 
 | When enabled, individual actions will be timed separately instead of being coalesced with adjacent actions of the same kind. Enabling this option will result in a performance penalty. | 
| 
 | When enabled, source-level shader performance will be revealed per ~10 usec interval in the Top-Down Calls table and other shader profiler views. Shader performance is collected via a high-speed sampling profiler in the SM hardware, which incurs no shader execution overhead at runtime, and only consumes PCIe TX bandwidth (along with perf counters). When disabled, a more detailed list of SM and L1TEX perf counters will be collected. | 
| 
 | Disable collection of Shader Pipelines. If Shader Pipeline collection is enabled the Shader Pipelines view will list all PSOs, shader code will be available for browsing, the Top-Down Calls tree will contain the static inlined call graph, and the Ray Tracing Live State view will be populated. When Shader Pipeline collection is disabled, the Shader Pipelines, Shader Source, and Top-Down Calls trees will be empty. | 
| 
 | Disable collection of external shader debug info in the trace. If collection of external shader debug info is enabled, GPU Trace will try to resolve debug information for shaders without pre-embedded debug information on the target application. Any found debug information will be embedded in the trace report. | 
| 
 | The sampling interval in cycles, for the hardware sampling profiler in the SM. This is referenced to the GPC clock frequency, which may run at the boost clock. If you see a warning message that the hardware dropped samples, try increasing this interval. Must be a power of 2; minimum 32. | 
| 
 | The maximum background traffic incurred by PM Counters and Warp State Sampling. | 
| 
 | Disable collection of shader hashes as part of the trace. | 
| 
 | Enable keep going mode to collect another trace after the current trace is done. When enabled, can keep collecting traces until manually terminate the process or request to exit (Ctrl+C). | 
Examples:
- ngfx.exe --activity="GPU Trace Profiler" --platform="Windows" --exe=D:\Bloom\bloom.exe --start-after-ms=10000 --architecture=Ada --metric-set-id=1 - Launch an application for automatically generating GPU Trace capture after waiting the specified count of seconds, with using the metric set index of 1 on the Ada GPU architecture. 
- ngfx.exe --activity="GPU Trace Profiler" --platform="Windows" --exe=D:\Bloom\bloom.exe --start-after-hotkey --auto-export - Launch an application for manually triggering generating GPU Trace capture. CLI is waiting for the capture triggered from the target side (pressing Target application capture hotkey on the running application). After the capture is finished, CLI also opens the generated GPU Trace capture and exports the metrics data. 
- ngfx.exe --activity="GPU Trace Profiler" --platform="Windows" --project="D:\Projects\Bloom.ngfx-proj" ---start-after-ms=10000 - Launch an application for automatically generating GPU Trace capture after waiting the specified count of milliseconds, but with using the launch options and activity options read from a Nsight Graphics project. 
Shader Debugger
Nsight Graphics provides a fully hardware accelerated Shader Debugger for debugging your shaders as they execute on the GPU. This can be used on Ampere and later hardware, and runs on your live application, not during a capture/replay scenario like in the Frame Debugger activity. The Shader Debugger functions similarly to how CPU source debuggers do, including the ability to put breakpoints in your source with optional conditionals based on variables, step in/over/out from the current PC, view shader program state via Locals and Watch windows, and select the warp/thread to debug via the Warp Info and Focus Picker windows.
Note
Shader debugging is currently only supported for the Vulkan API.
When to use the Shader Debugger Activity
You use the Shader Debugger activity whenever you need to better understand how your shader code is executing on the GPU.
- You have written a new section of code and want to step through it in order to ensure your algorithm functions as you expect it to. 
- You see a rendering anomaly and want to better understand what may have caused it. 
Basic Workflow
To start this activity, select Shader Debugger from the Connection dialog.
 
The basic workflow for the Shader Debugger activity is to launch your application and then use the Shaders View to select the pipeline and shader you are interested in debugging. Once you have that selected, you can put breakpoints in the shader and step through the code, inspect variable values, etc.
Note
Shaders must be generated with debug information in order to debug at a source level. Please refer to the Shader Debugger Setup section for additional setup details.
Configuring The Shader Debugger Activity
The Shader debugger enforces certain system configuration requirements, and will require the user to configure the system if those requirements are not met. Refer to Shader Debugger Setup to find out more about system requirements.
 
GPU Selection
The Shader Debugger activity allows the user to select the GPU that the application should run on. The launched application will only be able to create a graphics API device context on the selected GPU.
Note
When local system debugging, The selected target GPU must not be the same GPU that is running the Nsight Graphics host.
Debug Shaders
The debug compilation of shaders heavily impacts both overall shader compilation time and the applications runtime performance. The Shader Debugger allows the user to control the strategy of when the NVIDIA driver will compile shaders as debug (full symbol information, no optimizations).
The supported modes are:
- Just-In-Time Replacement
- Shaders will only be compiled as debug when the user creates a breakpoint in a shaders source. When creating a breakpoint, a new shader/pipeline will be created that will replace the usage of the existing shader/pipeline. 
 
 
- Just-In-Time Replacement + Source Recompilation
- Same as Just-In-Time Replacement, but the shader’s source will first be recompiled for use by the replacement shader/pipeline. 
 
 
- Always
- All shaders will be compiled as debug by the NVIDIA driver at load time. This mode has the highest impact on load times and runtime performance. This mode is practical for debugging shaders that trigger exceptions on first use, or for users who wish to be able to readily inspect warp state for any user shaders that are currently executing, not just ones stopped at a breakpoint. This mode should be used with caution. 
 
 
When either Just-In-Time Replacement or Just-In-Time Replacement + Source Recompilation is selected, the user may also select a Replacement Strategy. The Replacement Strategy determines the scope of pipeline/shader replacement that will occur when a breakpoint is created.
The supported strategies are:
- Selected Shader Only
- Only the specific shader/pipeline will be replaced with a debug equivalent, even if the shader was used by multiple pipelines. 
 
 
- All Usages of Selected Shader
- All shaders/pipelines that utilize the selected shader will be replaced with a debug equivalent. A shader is equivalent if it shares the same UID. 
 
 
- All Shaders
- All shaders will be replaced with a debug equivalent once a breakpoint is created. This is functionally similar to the Always mode, but will not impact the initial applications runtime performance. 
 
 
Note
By detaching the host or removing all breakpoints, all shader replacements will be reverted to their original shaders.
Debug Shader Overrides will allow the user to control when a specific shader should be compiled as debug at load time. This field will accept a comma delimited list of either the shaders App Hash or debug name. This activity setting can be managed by right clicking shaders in the Shaders View to add or remove entries.
Generate C++ Capture
The C++ Capture activity allows you to export an application frame as C++ code to be compiled and run as a self-contained application for later analysis, debugging, profiling, regression testing, and edit-and-compile experimentation.
When to Use the Generate C++ Capture Activity
While C++ captures can be collected in while Frame Debugging, the C++ capture activity provides a focused activity to streamline the creation of captures. Non-necessary analysis subsystems are turned off to allow for the quickest and most robust application capture. This activity is an excellent way to save a snapshot of your application, frozen in time. Use this activity when:
- You want to save a deterministic application for follow-up performance analysis. 
- You want to save a reference point for how your application is working. 
- You want to share a minimal reproducible with the developer tools or driver teams at NVIDIA to facilitate bug reporting. 
The Generate C++ Capture activity supports all APIs that are generally supported by Nsight Graphics.
Basic Workflow
To start this activity, select Generate C++ Capture from the connection dialog.
 
Once the application is running, the Generate C++ Capture button will be available on the main toolbar.
 
Once a capture is started, the target application will temporarily pause, and a progress dialog will be shown detailing the steps of the export to C++ process. When complete, the C++ project is written to the disk and the application will resume.
 
By default, the save directory is co-located beside the current project. If no project is currently loaded, the default save directory is used (see Options > Environment > Default Documents Folder).
In addition to the C++ project, the code generation process also produces an ngfx-cppcap file with additional information and utilities. These ngfx-cppcap files are automatically associated with the current project and can be reopened later.
 
The additional features of an ngfx-cppcap file include:
- Screenshot of the capture taken from the original application. 
- Information about the captured application and its original system. 
- Statistics about the captured API stream. 
- Utilities to build the C++ capture without opening the generated Visual Studio project. 
- Utilities to launch the compiled application: - The Execute button launches the compiled executable. 
- The Connect… button populates a new connection dialog that allows you to run a specific activity on the generated capture. 
 
- User comments that are persisted within this file. 
Using a Saved Capture
- To use the saved capture, use Visual Studio’s Open Folder capability on the directory that was generated. After doing this, Visual Studio reads the - CMakeLists.txtand allows you to build and run the executable. Alternatively, if you are using a version of Visual Studio that is earlier than 2017, and it does not support native CMake loading, you can use a standalone CMake tool to generate the projects for your version of Visual Studio.
- These solution files contain a number of generated source files. - Main.cpp— This is where all of the initialization code is called, resources are created, and each frame portion is called in a message loop.
- Resources NN .cpp— Depending on the number of resources to be created, there are multiple- Resources NN.cppfiles, each with a- CreateResourcesNNcall in them, that construct all of the resources (device(s), textures, shaders, etc.) that are used in the scene. These are called in- Main.cppbefore replaying the frame in the message loop.
- FrameSetup.cpp— This file contains all of the state setting calls to set the API state to the proper values for the beginning of the frame, including what buffers are bound, which shaders are enabled, etc.
- FrameNPartMM.cpp— In Direct3D and single-threaded OpenGL captures, these files contain the API functions, each named- RunFrameNPartMM(), to replay the frame. It is split into multiple files so generated code is easier to work with. These functions are called sequentially in the message loop in- Main.cpp.- Note - In this scenario, both - Nand- MMare placeholders for numbers in the multiple files generated.- Frame Nwill typically be- Frame0since only a single frame is captured, and- Part MMwill typically be in the- 00-05range, depending on how many API calls are in the frame.
- ThreadLLFrameNPartMM.cpp— In multi-threaded OpenGL captures, these files contain the API functions, each named- ThreadLLRunFrameNPartMM(), to replay the frame. The functions correspond to the work done by each thread during the frame. These functions are called by their respective threads and synchronized to replay the saved events in the same order as captured.
- ReadOnlyDatabase.cpp— This is a helper class to access resource data that is stored in the- data.binfile. It is accessed throughout the code via the- GetResource()call.
- Helpers.cpp— These functions are used throughout the replayer for various conversions and access to the- ReadOnlyDatabase.
- Threading.cpp— This file contains helper functions and classes to manage threads used in the project.
 
- Build and run the project. 
Building a Saved Capture for a Different Platform
For Vulkan applications, Nsight Graphics currently supports building and running the saved capture for a different platform. In other words, you may have a capture saved on Windows but would like to run it on Embedded Linux.
Nsight Graphics tries to support the cross-platform converting on a best-efforts basis, but there are some platform-specific considerations and mechanisms outside Vulkan, and some platform-specific extensions and functions may not be able to be converted properly for a different platform. Accordingly, the runtime compatibility of the cross-platform converted capture cannot be ensured.
To use a capture saved on a different platform, you can simply open an ngfx-cppcap file in Nsight Graphics through File > Open File…, and click Build button. We automatically determine the default compile options for you.
Alternatively, you can manually build a saved capture from a command line, but you need to explicitly specify the compile options:
- NV_TARGET_PLATFORM: select target platform to build for 
- NV_WINSYS: select window system type to interact with graphics API 
- NV_SDKDIR: specify the path to SDK directory (only required when building for Embedded Linux) 
Examples:
- Build a saved capture for Embedded Linux and use x11 as the window system: - mkdir -p int && cd int && cmake -G "Unix Makefiles" -DNV_INSTALL_FOLDER=bin -DNV_TARGET_PLATFORM=LINUX_EMBEDDED -DNV_WINSYS=x11 -DNV_SDKDIR=<SDK_PATH> .. && cmake --build . --config RelWithDebInfo --target install -- -j13 
- Build a saved capture for Windows: - MD int && CD int && cmake -G "Visual Studio 16 2019" -A x64 -DCMAKE_GENERATOR_INSTANCE="C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional" -DNV_INSTALL_FOLDER=bin .. && cmake --build . --config RelWithDebInfo --target install 
- Build a saved capture for Linux Desktop and use xcb as the window system: - mkdir -p int && cd int && cmake -G "Unix Makefiles" -DNV_INSTALL_FOLDER=bin -DNV_TARGET_PLATFORM=LINUX_DESKTOP -DNV_WINSYS=xcb .. && cmake --build . --config RelWithDebInfo --target install -- -j13 
Changing a Resource
If you want to change a resource (for example, to swap in a different texture), you can change the parameters for the construction by looking within the ResourcesNN.cpp files for the texture in question. Textures can be matched by size and/or format. Once you find the variable for the texture, look for that name in the FrameSetup.cpp file. This contains source lines to lock the texture, call GetResource() to retrieve the data from the ReadOnlyDatabase, and then call memcpy(…) to link the data to the texture. You can substitute the call to the ReadOnlyDatabase with a call to read from a file of choice to load the alternate texture.
Changing a Draw Call
If you want to change the state for a given draw call, you can locate the draw call by replaying the capture within Nsight Graphics and scrubbing to find the call you want to examine. Search in the FrameNPartMM.cpp files for Draw NN, where NN is the 0-based draw call index that Nsight Graphics displayed on the scrubber. Doing this brings you to the source line for that draw call, and from here, you can add any state changes before that call. Alternatively, you can also disable that specific call by commenting out the source call containing the draw call.
Parameters
- -repeatN— This setting enables Nsight Graphics to use serialized captures in the normal arch workflow. The N setting indicates the number of times to repeat the entire capture; the default setting is -1, which keeps the capture running on an infinite loop.
- -noreset— This setting controls whether context state and all resources are reset to their beginning of frame value. When this setting is specified, all frame restoration operations are skipped, avoiding the performance cost associated with them. Note that this may introduce rendering errors if the rendered frame has a data dependency on the results of a previous frame. Additionally, note that, while uncommon, skipping frame restoration does have the opportunity to lead to application crashes.
Generate C++ Capture from a Command Line
To understand how to generate C++ capture, start by launching the CLI with the --help-all argument. This will display all available options the CLI has.
The CLI can launch an application for generating C++ capture in the form:
ngfx.exe --activity="Generate C++ Capture" [general_options] [Generate_C++_Capture_activity_options]
See CLI Arguments Details for the general options details, the Generate C++ Capture activity options:
| Option | Description | 
|---|---|
| –wait-seconds arg | Wait time (in seconds) before capturing a frame. | 
| –wait-hotkey | The capture is expected to be triggered by pressing Target application capture hotkey on the running application. If enabled, the option about waiting in seconds would be ignored. | 
Examples:
- ngfx.exe --activity="Generate C++ Capture" --platform="Windows" --exe=D:\Bloom\bloom.exe --wait-seconds=10 - Launch an application for automatically generating C++ capture after waiting the specified count of seconds. 
- ngfx.exe --activity="Generate C++ Capture" --platform="Windows" --exe=D:\Bloom\bloom.exe --wait-hotkey - Launch an application for manually triggering capture so as to generate C++ capture. CLI is waiting for the capture triggered from the target side (pressing Target application capture hotkey on the running application). 
- ngfx.exe --activity="Generate C++ Capture" --platform="Windows" --project="D:\Projects\Bloom.ngfx-proj" --wait-seconds=10 --no-timeout - Launch an application for automatically generating C++ capture after waiting the specified count of seconds, but with using the launch options and activity options read from a Nsight Graphics project. In addition, - --no-timeoutdisables all timeouts in case this application may take a long time to launch/capture.
System Trace
To start this activity, select Graphics Capture from the connection dialog.
 
The System Trace activity is a special activity that connects to the Nsight Systems™ tool. See the Expected Workflow section to understand how Nsight Systems can fit into your profiling and optimization workflow.
When a compatible version of Nsight Systems is present on the system, the System Trace activity offers a connection to Nsight Systems to automatically populate a system trace activity on the external tool with settings from the Nsight Graphics connection dialog. This allows for easy saving of project properties in a singular location.
When a compatible version of Nsight Systems is not present on the system, the System Trace activity offers a convenient link to download the tool. Nsight Graphics must be restarted to discover any newly installed versions of Nsight Systems.
User Interface Reference
This section provides a deep view of all of the user interface elements and views that Nsight Graphics offers.
App Configuration and Activity Selection UI
Launch Tab
The Launch tab enables launching applications for analysis. This is where you add the basic process information to launch and subsequently connect to the application you wish to analyze.
This tab has the following controls:
- Application Executable — Specifies the root application to launch. Note that this may not be the final application that you wish to analyze. Reference this field using - $(ApplicationExecutable), or its parent directory using- $(ApplicationDir).
- Working Directory — The directory in which the application is launched. By default the working directory is set to the application directory. Reference this field using - $(WorkingDir).
- Command Line Arguments — Specify the arguments to pass the application executable. 
- Environment — The environment variables to set in the launched application. 
- Automatically Connect — Specifies whether the launched application should be automatically connected to. If the launched application is a launcher that creates the process that you ultimately wish to analyze, set this to ‘No.’ 
The following variables can be used in any of the Launch tab fields:
- $(ProjectDir) — Refers to directory in which the current project is saved. 
- $(ApplicationExecutable) — Refers to the value in the Application Executable field. 
- $(ApplicationDir) — Refers to parent folder of the Application Executable. 
- $(WorkingDir) — Refers to the value in the Working Directory field. 
Note
Several fields have a selector to allow you to cycle through recently used entries. This is a useful capability for cycling through common configurations.
 
Attach Tab
To attach to an application, it must have previously been launched through the launch tab. This page will list the launched application as well as any children that the application has launched.
Note
If the host disconnects for any reason, and the target happens to still be running, you can reattach to the previously launched or even captured application by using the attach facility. The process does not have to be newly relaunched.
Activities Options
Nsight Graphics allows for adjusting the activity with a large set of options. Options are available in the Connect window under the Additional Options section. These options are saved per-project, and per-activity, because the options for one activity may not relate to the other. Note that you may need to apply them to multiple activities if your needs for each activity are the same.
| Option | Description | 
|---|---|
| Capture Mode | Select the capture mode on how to trigger a capture: 
 | 
| Enable Target HUD | Enables the HUD on the target application, which enables: 
 | 
| Force Repaint | Enables a periodic trigger of window invalidation, which causes applications that lazily present to repaint, such as many professional visualization applications. This is useful for providing a consistent stream of frames with which Nsight Graphics can perform its analysis. | 
| Option | Description | 
|---|---|
| Frame Delimiter | Select the API call used to delimit frame boundaries for OpenGL applications. This setting is useful for applications that do not necessarily present to a screen, such as offscreen rendering applications or benchmark applications. | 
| Option | Description | 
|---|---|
| Synchronous Shader Collection | Controls the extent of information that is collected for D3D11 shaders. Synchronous collection is necessary for some shader related statistics but may introduce increased application loading time. Synchronous collection also requires that the application has been started with administrative privileges. | 
| D3D12 Replay Fence Behavior | Choose the behavior when encountering a sync point during D3D12 replay. 
 
 | 
| DXGI SyncInterval | Controls the SyncInterval value passed to the DXGI Present method. The default is to disable V-Sync to allow the debugger to collect valid real-time counters. | 
| Enable Revision Zero Data Collection | Controls the collection of revision zero (e.g., pre-capture) data during capture. This is potentially an expensive operation, in both memory and processing time, and some applications can replay a single frame without explicitly storing these revisions. 
 | 
| Replay Captured ExecuteIndirect Buffer | When enabled, replays the application’s captured ExecuteIndirect buffer instead of a replay-generated buffer. Consider this option if your application has rendering issues in replay that derive from a non-deterministic ExecuteIndirect buffer; for example, one that is generated based off of atomic operations that can vary from frame-to-frame. | 
| Report Force-Failed Query Interfaces | Controls whether failed query interfaces are reported to a user with a blocking message box. Nsight Graphics is an API debugger, and there may be some APIs that it does not yet support or does not yet know about. When such an interface is queried, the interception will force the failure of the operation with an E_NOINTERFACE return code. If this operation interferes with normal operation, and otherwise would result in no issues, it may be disabled for the project. | 
| Report Unknown Objects | Controls whether unknown objects are reported to a user with a blocking message box. Some applications pass objects that are unknown to Nsight Graphics. These objects may be indicative of an application bug, lack of support in the product’s interception, or they may ultimately be innocuous. In many cases, such an unknown object may result in an analysis crash. To mitigate this issue, Nsight Graphics warns about this concern with a blocking message box. If this operation interferes with normal operation, and otherwise would result in no issues, it may be disabled for the project. | 
| Support Cached Pipeline State | Controls whether cached pipeline state is supported. By default, Nsight Graphics will reject calls to create or load a cached pipeline state object. Setting this option to true will enable support for these objects. | 
| Force Minimal Shader Bind Tables | Controls whether minimal shader bind tables are forced. By default, Nsight Graphics will create replay shader bind tables matching the size specified. in the source application. If this option is enabled, it will use the last valid, non-null record in the SBT to override the replay buffer size. This is not universally safe since indexing null records is a valid usage. | 
| Option | Description | 
|---|---|
| Force Validation | Force the Vulkan validation layers to be enabled. This requires the LunarG Vulkan SDK to be installed. | 
| Validation Layers | Layers used when force enabling validation. This option is only visible when ‘Force Validation’ is turned on. | 
| Device Address C++ Support | Enables buffer device address capture/replay support so that buffers with VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT have the same address between capture and C++ replay. Enabling this option may lead to instability due to different allocation methods used by the driver. This option is only needed to when generating C++ captures of applications that indirectly access a buffer address within a shader. | 
| Force Capture/Replay Device Memory | Forces the VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT and VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT bits to be be set on all device memory allocations. This is necessary if the application is later binding addressable buffers but incorrectly excluded the flag on the associated memory. | 
| Capture All Device Memory for C++ Captures | Includes the entire contents of device memory for C++ captures, as opposed to just the bound contents. This results in larger capture but might address issues with out-of-bounds memory access. | 
| Enable Coherent Buffer Collection | Controls the monitoring and collection of mapped coherent buffer updates during capture. This is potentially an expensive operation and many applications can replay a single frame without actively monitoring these changes. Use this option if your capture takes a long time but you do not straddle frames with coherent updates. | 
| Enable Revision Zero Data Collection | Controls the collection of revision zero (e.g., pre-capture) data during capture. This is potentially an expensive operation, in both memory and processing time, and some applications can replay a single frame without explicitly storing these revisions. 
 | 
| Allow Unsafe pNext Values | Allows the inspection of Vulkan structures with potentially dangerous pNext values. By default, structures with no known extensions are skipped. | 
| Use Safe Object Lookup | Controls how objects are stored internally by the tool. Safe lookup are slower but may improve stability when using an unsupported extension. 
 | 
| C++ Capture Object Set | This option controls which objects are exported as part of a Vulkan C++ capture. By default we limit the object set to only objects used in the capture but in some cases a user might want to see all objects used in the application. This typically isn’t necessary and can lead to a very large C++ project. This might also help WAR a bug where the tool incorrectly prunes an object it shouldn’t have. 
 | 
| Reserve Heap Space | Amount of physical device heap space (MB) to automatically reserve for the frame debugger. | 
| Unweave Threads | For multi-threaded applications, attempts to remove excessive context switching by grouping thread events together. May improve C++ capture replay performance of heavily threaded applications. | 
| Ignore DirectX/OpenGL Wrapper Libraries | To capture an application that uses wrapper libraries atop Vulkan, for example DXVK, set this setting to ‘Yes’ to ignore the wrapper library and capture the underlying Vulkan calls. When set to ‘Auto’, Nsight Graphics will attempt to auto-detect whether wrapper libraries should be ignored. | 
| Enable Vulkan SC Support | Set this setting to ‘Yes’ to launch Vulkan SC application. | 
| Vulkan SC Reserve Command Buffers | For Vulkan SC applications, frame replay needs additional command buffers. The number is not fixed and must be more than the draw/dispatch count in the captured frame. It is set with a default value, you should increase it if frame capture failed. | 
| Option | Description | 
|---|---|
| Acceleration Structure Geometry Tracking | This option controls how geometry data is tracked for acceleration structures. There are trade-offs between performance, accuracy, and robustness of any given approach. The default setting of ‘Auto’ is most often implemented in terms of ‘Deep Geometry Copy’, which tries to match the most common application behavior whereby a deep copy is needed. For example, after building an acceleration structure, it is legal for an application to update or destroy the geometry buffers that were used in construction. In this situation, without deep copies of the original data, the tool cannot guarantee full function of the ray tracing inspector, or of C++ capture. If you know that your application does not update or destroy buffers after construction, consider a ‘Shallow Geometry Reference’ option. 
 | 
| Track Acceleration Structure Refits | Controls whether acceleration structure refits should be tracked in addition to builds. This requires additional memory consumption but may result in a C++ capture with performance that is more representative of the original application. | 
| Report Shallow Report Warnings | Controls whether warnings are issued for possible shallow reference validity issues. If an expert user knows that the original acceleration structure input data remains undisturbed they may silence warnings with this setting. | 
| Geometry Collection Pool | Specifies which memory pool to collect geometry data in. Vidmem is generally more performant but may be more limited than sysmem. | 
| Force Ray Tracing Dimensions to Zero | Specifies that width/height/depth dimensions should be forced to zero for ray tracing calls. This allows the acceleration structures and SBTs to be viewed without actually performing the command. This is helpful in situations where replaying the command causes a crash or device lost issue. | 
| Ray Tracing Validation | Enable ray tracing validation layers. By default, the validation layers are off. You may set this option to collect 1) errors or 2) warnings and errors. Any collection information would be logged to the Output Messages window. | 
| Option | Description | 
|---|---|
| Enable Driver Instrumentation | Controls the enablement of capabilities that require driver support. This effectively disables: 
 Disabling this option is the first and best option to try if you run into capture errors as it disambiguates problems quickly given the number of subsystems it turns off. | 
| Collect Shader Reflection | Controls the collection of all information reflected from shader objects. This includes source code, disassembly, input attributes, resource associations, etc. Note: dynamic shader editing is not available when this option is disabled. This option is useful if you suspect an error or incompatibility with a shader reflection tool (such as D3DCompiler.dll or SPIRV-Cross). | 
| Collect SASS | Enable fetch of SASS shaders which can be used to collect shader performance stats. | 
| Collect Line Tables | Enable creation of shader-to-PC line tables used by the shader profiler for source correlation. | 
| Collect Hardware Performance Metrics | Enables the collection of performance metrics from the hardware. | 
| Ignore Incompatibilities | Nsight Graphics uses an incompatibility system to detect and report problems that are likely to interfere with the analysis of your application. By default, these incompatibilities are reported and the user is given the option of capturing despite them (with an associated warning of the possibility of issues). Some applications may have innocuous incompatibilities however, and having to view this warning every time might be undesired. When this option is enabled, the frame will attempt to capture despite any incompatibilities. Use this option only when you are certain that the incompatibility will not impact your analysis. | 
| Block on First Incompatibility | Nsight Graphics uses an incompatibility system to detect and report problems that are likely to interfere with the analysis of your application. In some cases, these incompatibilities may be the first sign of an impending failure. Accordingly, being able to block on such a reported failure may aid in triaging and understanding a crash when running under Nsight Graphics. This option defaults to ‘Auto’ so it only reports critical incompatibilities, allowing lesser incompatibilities so as not to interfere with expected operation. It may be useful to toggle to ‘Enable’ if you encounter an application crash under Nsight Graphics to force an opportunity to investigate the crash. | 
| Enable Crash Reporting | Enables the collection and reporting of crash data to help identify issues with the frame debugger. While a user is always prompted before a crash report is sent, this option is available to suppress these facilities entirely. | 
| Enable C/C++ Serialization | Enables the ability to serialize a capture to C/C++. By default, applications are available to create a C++ capture, but there are some cases where extra data is collected in support of this feature before it is invoked. This option allows that collection to be disabled entirely. | 
| Force Single-Threaded Capture | Controls whether capture proceeds with concurrent threads or with serialized threads. Use this option if you suspect your application’s multi-threading may be interfering with the capture process. | 
| Replay Thread Pause Strategy | Controls the strategy used in live analysis for pausing threads. 
 | 
| Disable All Interception | Turns off the interception mechanisms that Nsight Graphics uses to analyze the application. This is used to disambiguate launch failures from failures caused by interception. It is useful for validating that the application settings are correct. When enabled, the associated activity is not expected to work in any way. Please disable this option once you have confirmed that your application launches successfully. | 
Graphics Capture
Overview
This section provides an overview of the tools that are available while debugging a graphics capture.
API Inspector
The API Inspector is a view that offers an exhaustive look at all of the state that is relevant to a particular event to which the capture analysis is scrubbed.
To access this view, go to Graphics Debugger > API Inspector.
 
With API Inspector pages, there is a search bar that offers a quick way of finding the information you need on a particular page. The bar indicates the number of matches in each page, and forward and back navigation buttons are provided for navigating between each match. The buttons also support keybindings, with F3 for next and Shift+F3 for previous.
 
Expand/Collapse
Within an API Inspector page, there are many sections that can be expanded or collapsed to help narrow the information that is displayed to only the information you wish to see at that point in time. While each section can be individually collapsed, the UI has buttons that allow for expanding or collapsing all elements in one click.
 
Export
Each page has the ability to be exported to structured data in a json format. This json data includes key value pairs of the data elements, as well as indirections that indicate the relationships between different kinds of data.
This data is useful in cases where you may want to export data for persistence, or perhaps to run a diff between the data of different events.
 
The view also has the ability to export data from all pages. The information in each is exported to a large, combined file in a structured json format. To accomplish this, use the Export All Pages to Json button.
 
API Statistics View
The API Statistics View is a high-level view of important API calls, and includes information to help you see where GPU and CPU time is spent.
To access this view, go to Graphics Debugger > API Statistics.
 
Current Target View
The Current Target view is used to show the currently bound output targets. This can be useful because it focuses in on the bound output resources, rather than having to search for them in the All Resources view.
To access this view, go to Graphics Debugger > Current Target.
Current Target displays thumbnails along the left pane for all currently bound color, depth, and stencil targets. This view changes as you scrub from event to event. All of the thumbnails on the left can be selected to show a larger image on the right. You can also click the link below each to open the target in the Resource Viewer.
 
Descriptors
The Descriptors view shows descriptors that are relevant to the current event.
To access this view, go to Graphics Debugger > Descriptors.
 
The Descriptors view shows API-appropriate Descriptor objects and the resources that they refer to. The properties of each descriptor (and the resources they refer to) are listed on the right.
Event Details
The Event Details view shows all parameters for the current event in a hierarchical tree structure that allows for searching.
To access this view, go to Graphics Debugger > Event Details.
 
Because this window shows parameters for the current event, it changes as you navigate the scene. If you wish to keep the parameters for comparison against another call, the view supports Clone and Lock capabilities.
| Column | Description | 
|---|---|
| Name | The name of the parameter or child of a parameter structure. | 
| Value | The value passed in to the API call for this parameter. | 
| Type | The type of the parameter in question. | 
For events that reference API objects, the event details view provides a link to examine more information to that object in the Object Browser.
Event Viewer
The Events view shows all API calls in a captured frame. It also displays both CPU and GPU activity, as a measurement of how much each call “costs.”
To access this view, go to Graphics Debugger > Events.
 
The visibility of columns can be toggled by right-clicking on the table’s header. By default some columns are hidden if they offer no unique data (e.g., single thread) for the captured frame.
| Column | Description | 
|---|---|
| (Indicator Column) | Points to the currently scrubbed event | 
| Event | The event ID of the API call, operation, or comment for this event. On the left-hand-side, when perfmarkers are used, a perfmarker stack is visually indicated and the ID indicates the range of events contained within this hierarchy level. Hovering over this cell shows the full perfmarker hierarchy. When the event in question is an action, the right-hand-side contains a link to open the event in the API Inspector. | 
| Description | Describes the API call, operation, or comment. This column hierarchically lists the event hierarchy as controlled by the usage of Marker APIs. | 
| Object | The API object (context, queue, resource, etc.) for which the event was operated on or by. | 
| CPU ms | CPU timing for the event in question. When this is a perfmarker range, this indicates an aggregate summation. These timings are provided for reference, as they are not fully accurate due to the impact of capture operations on their timing. Note that tracing tools like Nsight Systems™ are targeted for higher resolution timing. | 
| Thread | The thread that performed the API call, operation, or comment. | 
Geometry View
The Geometry view takes the state of the Direct3D, OpenGL, or Vulkan machine, along with the parameters for the current draw call, and shows pre-transformed geometry.
To access this view, go to Graphics Debugger > Geometry.
There are two views into this data: a graphical view and a memory view.
Graphical Tab
 
Mouse Events
- Hover — Hint the elements of the hovered primitive. 
- Left Click — Select the primitive or reset the selection if clicking at nothing. When selecting in the graphical viewer, the correlated rows in the memory table are also selected at the same time. 
Attribute Options
- Position — Specifies the vertex attribute to use for positional geometry data. 
- Color — Specifies how to color the geometry. If Diffuse Color is selected, the selected diffuse color swatch is used for coloring. If a vertex attribute is selected, the selected attribute is used for per-vertex coloring.   
- Normal — Specifies the per-vertex normal. This selection applies when using a shade mode that specifies Normal Attribute or when rendering normal vectors. 
Rendering Options
Clicking Configure in the bottom right corner of the Geometry View opens up the rendering options menu.
 
- Reset Camera — Resets the camera to its default orientation. By default, the viewer bounds all geometry with a bounding sphere for optimal orientation. 
- Zoom To Selected — Zoom the camera to the selected primitive. 
- Render Mode — Determines how to render and raster geometry. - Solid: renders filled geometry. 
- Points: renders a vertex point cloud.   
- Wireframe: renders a wireframe of the geometry. 
- Wireframe + Solid: renders filled geometry with a wireframe on top of it. 
 
- Shade Mode — Specifies the lighting mode of the rendered image. - Selected Color Attribute: Shades with the specified color attribute 
- Flat Shading Using Generated Normals: Renders the geometry using flat shading with calculated normals 
- Flat Sharing Using Normal Attribute: Renders the geometry using flat shading with the specified Normal Attribute. 
- Smooth Shading Using Normal Attribute: Renders the geometry using smooth shading with the specified Normal Attribute. 
 
- Render Normal Vectors — Renders the specified normal attribute as a vector pointing from each vertex. The vector may be colored by the Normal Color selection and may be scaled by the Normal Scale selection.   
Memory Tab
The Memory tab of the Geometry View shows the contents of the vertex buffer, as interpreted by the current vertex or input attribute specification. This view is useful for seeing the raw data of your draw call. An additional capability of this view is that it highlights invalid or corrupt vertices to streamline finding problematic data. Another useful feature is that the selection linkage to the graphical viewer, where selecting a memory row also selects the associated primitive.
There are two modes of display for the geometry data:
- Index Buffer Order shows the vertices as indexed by the current index buffer and current draw call.   
- Vertex Buffer Order shows the vertices as linearly laid out from the start of the vertex buffer and draw call specification.   
Memory View
The Memory view shows descriptors that are relevant to the current event.
To access this view, go to Graphics Debugger > Memory.
 
The Memory View shows memory objects created by the application. It indicates their size and properties. For resources that bind other resources, a list of resources is shown on the right-hand side. On the bottom of the right-hand side, a linear indication of the layout and placement of these resources is shown. Resources may be sorted in the table to be able to pick out important features; for example, Heaps with the most resources or resources of the greatest size.
Each resource has a link that may be clicked to gather more information about that resource in the object browser.
Object Browser
The Object Browser view provides a list of all objects tracked for your frame, listed by name and by type. Beneath each object is a list of the properties and other metadata that Nsight Graphics tracks. This view is useful for finding objects that utilize a particular kind of property, for example a memory buffer with a particular flag.
To access this view, go to Graphics Debugger > Object Browser. This view is also a destination for links provided by the Event Details and Event Viewer views.
 
This view supports Clone capabilities. Note, however, that this view captures fixed properties and metadata for each object at the end of frame. For APIs with mutable object properties, such as OpenGL, those properties are not updated in coordination with scrubbing. As such, Lock capabilities are not applicable to this view.
This view provides two panes side-by-side. The left-hand Objects pane provides the object list as well as their properties; the right-hand pane is context-sensitive and provides additional information about the object that is selected on the left-hand side.
Objects Pane
The objects pane (left-hand side) provides several capabilities for filtering objects:
- Object names and properties can be filtered via the filter box. 
- Object types can be filtered via the types combo box (default is All Types). 
- Objects can be filtered as to whether they report usages (default is Any # of Usages). 
- The Filter Properties checkbox determines whether the properties of an object are considered when filtering items in the object tree view. 
Object Name
This section shows the name of the selected object.
When the selected object has a specific viewer for viewing additional information about that type, a link to that specific viewer is provided. For example, texture resources provide a link to opening the selected texture in the Resource Viewer.
Object Usage
This section lists a table for the events in which an object is used. Each event is tagged to indicate the Usage of that object (READ or WRITE).
Ray Tracing Inspector
The Ray Tracing Inspector shows the geometry that has been specified in build commands when running an application that uses ray tracing APIs. If the application does not use these APIs, the view is not available.
In Ray tracing APIs, such as DXR and NVIDIA Vulkan Ray Tracing, an acceleration structure is a data structure that describes the full-scene geometry that is traced when performing the ray tracing operation. This data structure is described in detail in the following links: https://developer.nvidia.com/rtx/raytracing/dxr/DX12-Raytracing-tutorial-Part-1 and https://developer.nvidia.com/rtx/raytracing/vkray.
This data structure is purpose-built to allow for translation to application-specific data structures that perform well on modern GPUs. While constructing this data structure, the developer has the responsibility of constructing the structure correctly and using flags to identify the functional and performance characteristics within it. Needless to say, this can be an error-prone operation.
Nsight Graphics Ray Tracing Inspector allows you to view the structures you are creating, navigate through them, and see the flags that you are using. Additionally, you can filter and colorize the structure to highlight, at a bird’s-eye view, different kinds of geometry.
The Ray Tracing Inspector is opened through links from resource thumbnails or entries in many Nsight views, for example from the API Inspector or All Resources View. For example, the Ray Tracing Inspector can be opened from the API Inspector View when scrubbed to a build event trace rays call. When scrubbed to these events, the view presents a list of the active structures with a link to open each.
The view is multi-paned — it shows a hierarchical view of the acceleration structure on the left, a graphical view of the structure in the middle, and controls and options on the right. Additionally, a performance analysis section is present on the lower-left. With the hierarchy of the Acceleration Structure tree view, the top-level acceleration structure (TLAS), bottom-level acceleration structures (BLAS), child instances, child geometries, and memory sizes are presented. When a particular item is selected, the name, flags, and other meta-data for this entry are listed in a section on the bottom left-hand side. Each item within the tree has a check box that allows the rendering of the selected geometry or hierarchy to be disabled. Double-clicking on an item jumps to the item in the rendering view and automatically adjust the camera speed to be relative to the size of the selected object.
 
| Column | Description | 
|---|---|
| Name | An identifier for each row in the hierarchy. Click on the check box next to the name to show or hide the selected geometry or hierarchy. Double-click on this entry to jump to the item in the rendering view. | 
| # Prims | The number of primitives that make up this geometry. | 
| Surface Area | A calculation of the total surface area for the AABB that bounds the particular entry. | 
| Size | A calculation of the memory usage for this particular level. Hover over this entry to see a tooltip that includes a roll-up calculation of the aggregate memory usage of this hierarchical level and children below it. | 
Performance analysis tools are accessible in the bottom left corner on the main view. These tools help identify potential performance problems that are outlined in the RTX Ray Tracing Best Practices Guide. These analysis tools aim to give a broad picture of acceleration structures that may exhibit sub-optimal performance. To find the most optimal solution, profiling and experimentation is recommended but these tools may paint a better picture as to why one structure performs poorly compared to another.
| Action | Description | 
|---|---|
| Instance Overlaps | Identifies instance AABBs that overlap with other instances. Consider merging BLASes when instance world-space AABBs overlap significantly to potentially increase performance. | 
| Instance Heatmap | Enters a heatmap mode that shows the approximate number of instances AABBs that are hit by a ray cast from each pixel within the current viewport. This mode offers a convenient way to scan your scene for potentially problematic geometries. | 
Filtering and Highlight
The acceleration structure tree view supports geometry filtering as well as highlighting of data matching particular characteristics. The checkboxes next to each geometry allow individual toggling between full rendering, wireframe rendering, and no rendering. Combining this capability with search allows you to identify the geometry of interest (by name when the application has named its resources) and display only that geometry.
Geometry instances can also be selected by clicking on them in the main graphical view. Additionally, right-clicking in the main graphical view gives options to hide/show all geometry, hide the selected geometry, or hide all but the selected geometry.
 
Beyond filtering, the view also supports highlight-based identification of geometry specified with particular flags. Checking each Highlight option identifies those resources matching that flag, colorizing for easy identification. Clicking an entry in this section dims all geometry that does not meet the filter criteria, allowing items that do match the filter to stand out. Selecting multiple filters requires the passing geometry to meet all selected filters (e.g., AND logic). Additionally, the heading text is updated to reflect the number of items that meet this filter criteria.
 
Rendering Options
Under the highlight controls, additional rendering options are available. These include methods to control the geometry colors and the ability to toggle the drawing of AABBs.
 
Performance Analysis
The Ray Tracing Inspector has a number of tools that support performance analysis of your accelerations structures.
 
| Navigation | |
|---|---|
| 
 | Disable the current performance analysis mode | 
| 
 | Enable Instance AABB Overlap Heatmap (world space) | 
| 
 | Enable Traversal Timing Heatmap | 
| 
 | Enable Ray-Primitive Intersection Heatmap | 
| 
 | Enable Instance AABB Overlap Table | 
| 
 | Alternate between the current and previous performance analysis mode | 
Instance AABB Overlap Heatmap (world space)
This tool helps identify world-space AABB overlaps with the goal of helping a developer reduce expensive overlap.
 
This heatmap overlay shows the maximum number of instance AABBs that overlap in world space for any direction within the current viewport. The more instances hit, the hotter the color. The heatmap threshold can be controlled by “Heatmap Overlay Options -> AABB Threshold” and depends on the number of instance AABBs overlapping.
When world-space AABBs of instances overlap, unnecessary BLAS traversal maybe required. If these overlaps are caused by empty space within a BLAS, consider splitting the BLAS such that this empty space is minimized. If these overlaps are caused by non-opaque geometries, minimize the area not marked as opaque to increase performance.
Traversal Timing Heatmap
This tool provides a heatmap of ray traversal time with the goal of identifying expensive geometry.
 
This heatmap overlay shows the number of GPU clock cycles required to trace a ray from the center of each pixel in the viewport to the closest geometry hit. The heatmap threshold can be controlled by “Heatmap Overlap Options -> Timing Threshold” and depends on the traversal speed of the current acceleration structure.
To determine how certain geometry affects the ray traversal speed, geometry can be hidden in which case it is ignored by this algorithm. Additionally, setting “Heatmap Overlay Options -> Opacity Override” to “Force Opaque” traverses the acceleration structure as if all geometry is opaque.
Ray-Primitive Intersection Heatmap
This tool provides a heatmap of ray geometry intersection with the goal of identify expensive geometry.
 
This heatmap overlay shows the number of surfaces hit by a ray traced into the scene from the center of each pixel. The more surfaces hit, the hotter the color. In this mode, opaque geometry terminates a ray. The heatmap threshold can be controlled by “Heatmap Overlay Options -> Intersection Threshold.” When geometry is not marked as opaque, a ray must be traced through all surfaces of that geometry until an opaque surface is hit. To minimize intersection tests, use the opaque geometry flag wherever possible, or use opacity micromaps the mask off opaque sections of non-opaque geometry.
Instance AABB Overlap Table
This table provides information on overlap to allow you to consider merging BLASes when instance world-space AABBs overlap significantly.
 
When world-space AABBs of instances overlap, the TLAS becomes non-optimal. A ray can then hit more than one instance in a volume in space. Traversing through BLASes of all those instances is then required to resolve the closest hit. Traversing through one merged BLAS would be more efficient. Tracing performance against a BLAS doesn’t depend on the number of geometries in it. Geometries merged into a single BLAS can still have unique materials.
Export
Exporting the view, by clicking on the Save (disk) icon in the upper left of the view toolbar, allows for persisting the data you have collected beyond the immediate analysis session. This capability is particularly valuable for comparing different revisions of your geometry or sharing with others. Bookmarks are persisted as well. An example use case is identifying sub-optimal geometry, bookmarking it, and passing this document to a level designer or artist for correction.
Resource Viewer
The Resource Viewer allows you to see all of the available resources in the scene. This view is brought up by clicking resource links in any graphics debugging view.
The Resource Viewer is opened through links from resource thumbnails or entries in many Nsight Graphics views, for example from the API Inspector.
 
On the left-hand side, there is a Resource Info panel that shows properties of the resource, and below it a Revisions panel that shows revisions of the data and allows scrubbing between them.
On the right-hand side, there are two tabs available:
- Graphical 
- Memory 
Graphical Tab
The Graphical tab allows you to inspect the resource, pan using the left mouse button to click and drag, zoom using the mouse wheel, and inspect pixel values. Also, this is where you can save the resource to disk.
 
Within the Graphical tab, there is a panel on the right-hand side that contains the following sections: Visualization, Histogram, and Pixel Info.
Visualization
The Visualization panel contains controls to set the data format, color channels and blending strategy, as well as zoom, flip, and gamma.
 
Each of these controls allows you to adjust the visualization of the resource to best meet your needs. Additionally, the panel contains an ability to configure a custom shader to visualize your resource. At present, this custom shader only supports GLSL, but may receive future updates to support further shading languages.
Histogram
The Histogram panel contains a histogram of each of the color channels of the resource.
 
To modify the histogram view, the following options are available:
- You can set the minimum and maximum cutoff values via the sliders under the histograms, or by typing in values in the Minimum and Maximum boxes. 
- You can change the scale by using the Log button. 
- The Luminance button allows you to visualize luminance instead of color values. 
- The Clamp button can preset the minimum and maximum values to the extents of the data in the resource. 
Pixel Info
The Pixel Info panel contains a viewer that shows a selected pixel alongisde its neighbors.
 
This panel is useful for investigating pixel values and their positions.
Pixel History
Pixel history enables the detection of the events that contributed to the change in a pixel’s value.
To run a pixel history test, right-click to select a pixel of interest. Then, click the Pixel History button. The Pixel History view comes up with a loading bar and presents the results once they are complete.
 
Memory Tab
The Memory tab shows a dump of the resource data.
 
You can use multiple options to configure how this memory is displayed:
- The Axis drop-down changes between address (memory offset) and index (array element) views. 
- The Offset entry limits the view to an offset within the given resource. 
- The Extent entry limits the view to a maximum extent within the given resource. 
- The Precision spin box controls the number of decimal places to show for floating point entries. 
- The Hex Display toggles between decimal (base-10) and hex (base-8) display formats. 
- Hash shows a hash value representative of the given memory resource within the current offset and extent. This is useful for comparing memory objects or sub-regions. 
- The Transpose button swaps the rows and columns of the data representation. 
- The Configure button opens the Structured Memory Configuration dialog. 
Additional Capabilities
There are a number of additional capabilities in this view. At the top of the viewer, there is a toolbar:
 
- Clone — makes a copy of the current view, so that you can open another instance. 
- Lock — freezes the current view so that changing the current event does not update this view. This is helpful when trying to compare the state or a resource at two different actions. 
- Mip — Change the viewed mip level of the resource. 
- Layer — Change the viewed layer of the resource. 
- Aspect — Change the viewed aspect of the resource. 
Root Parameters
The Root Parameters view displays all of the root parameters bound for the current event. This allows you to quickly change the state of what you’re sampling from, constants, and other descriptors at a lightweight, faster rate than past APIs.
To access this view, go to Graphics Debugger > Root Parameters.
 
The root signature displays the structure definition of what’s bound at that moment. Root parameters fill in that structure with the values you’re sampling from and the constants you’re using.
The view can link out to other object-specific viewers, such as the descriptor view, the resource view, the object browser, or memory views.
Scrubber
The Nsight Graphics Capture Debugger Scrubber allows you to visually identify your frame(s) characteristics and select events to analyze.
To access this view, go to Graphics Debugger > Scrubber.
Understanding the Frame Scrubber
For the sake of discussion when it comes to graphics debugging, it helps to note some common terminology.
- An event is a single call to the API. It could be a triangle draw call, or backbuffer clear, or a less obvious call, like configuring buffers. A snapshot is a sequence of events. 
- An action is a subset of the event types. It can be one of the following: (1) Draw Call, (2) Clear, or (3) Dispatch. Actions are interesting since they explicitly change data which may result in visual changes. 
When you debug your graphics project, the Scrubber window shows the perf markers you implemented. When working with user-defined markers, the Scrubber window uses the color and label that you defined for the perf marker.
 
On the Scrubber, you can select one performance marker and it automatically creates a range of all of the draw calls that occurred within that time frame. Clicking on it again causes the scrubber to automatically zoom to that range of events. You can zoom in on a nested/child marker the same way.
To zoom out, click the parent performance marker, or use CTRL + mouse wheel.
Performance markers are also displayed on the HUD, color-coded the same way that they are on the Scrubber. However, on the HUD, the information is condensed, and you must hover your mouse over the selected performance marker to get its details.
The Scrubber has several visualization modes. To change the visualization mode, choose one of the following from the Mode dropdown menu:
- Unit Scale is the default view, which simply shows the actions and events on the timeline. 
 
- CPU Time Scale displays the CPU activity and how much each event or action cost the CPU. 
 
- GPU Time Scale displays the GPU activity and how much each event or action cost the GPU. This mode is only available when in Live Replay and event timings are colleted. 
 
Using Hotkeys to Scrub Through a Frame
When the scrubber has focus, you can use the following hotkeys to move the scrubber cursor from one event to another.
| Navigation | |
|---|---|
| 
 | Go to the first event. | 
| 
 | Go the last event. | 
| 
 | Go to the previous event. | 
| 
 | Go to the next event. | 
| 
 | Expand the current event group (HUD only). | 
| 
 | Collapse the current event group (HUD only). | 
| 
 | Current event: show less information (HUD only). | 
| 
 | Current event: show more information (HUD only). | 
| Zooming and Panning | |
| 
 
 | Zoom in X-axis | 
| 
 
 | Zoom out X-axis | 
| 
 | Reset zoom | 
| 
 | Increase row height (all rows) | 
| 
 | Decrease row height (all rows) | 
| 
 | Pan | 
| 
 | View zoom window | 
| Cursor and Selection | |
| 
 | Set cursor(Places cursor at closest point to the start of a range.) | 
| 
 | Select row (The selected row is highlighted in orange.) | 
| 
 | Make range selection | 
| 
 | Zoom to range | 
| 
 
 | Open API Inspector | 
| CTRL + A | Select all events | 
For the purpose of moving the scrubber cursor, the following are considered action events:
- Draw methods 
- Clear methods 
- Dispatch methods 
- Present methods 
For example, if you are looking for the next draw method that was called, you can press the CTRL + RIGHT ARROW on the keyboard to skip over events that are not typically of interest, and only stop on events that are considered action events.
Shader Browser
The Shader Browser lists all of the shaders in your application.
To access this view, go to Graphics Debugger > Shader Browser.
For programs or pipeline objects, you can view the individual shaders by pressing the ► button to the left of the program/pipeline name. When expanded, you can select the link to open a text view of the shader source (when available).
 
Name
This is the name of the shader. This name is either generated internally, or can be assigned by the user per API.
Type
The type of the shader: Vertex, Pixel, Compute, etc.
UID
Indicates a unique object ID for the associated pipeline or shader.
Context
Indicates to which of the application’s contexts this shader is owned. Shown only on multi-context OpenGL applications.
Status
This column displays the current status of the shader. The status includes Source or Binary, to denote whether or not source code is available for this shader. Also, if the microcode text is included, this means that we have driver-level binary code that is necessary for gathering shader performance metrics.
Debug Info
This column indicates the availability and load status of debug information for the associated shader.
File Name
Lists the file name from which the shader was compiled for those shaders with debug info that provide this information.
App Hash
This column displays a unique hash of application level shader information (e.g., bytecode).
Frame Debugging UI
The Frame Debugger activity is a capture-based activity. There are two classes of views in these activities — pre-capture views and post-capture views. Pre-capture views generally report real-time information on the application as it is running. Post-capture views show information related to the captured frame and are only available after an application has been captured for live analysis. For an example of how to capture, follow the example walkthrough in How to Launch and Connect to Your Application.
All Resources View
The All Resources View allows you to see all of the available resources in the scene.
To access this view, go to Frame Debugger > All Resources.
This view shows a grid of all of the resources used by the application. For graphical resources, these resources will be displayed graphically. For others, an icon is used to denote its type. When a resource is selected, a row of revisions will be shown for that resource. Clicking on any revision will change the frame debugger event to the closest event that generated or had the potential of consuming that revision.
Clicking the link below a resource, or double-clicking on the resource thumbnail, will open a Resource Viewer or Ray Tracing Inspector on that resource.
There are a number of additional capabilities in this view. At the top of the All Resources view, you’ll find a toolbar:
- Clone — makes a copy of the current view, so that you can open another instance. 
- Lock — freezes the current view so that changing the current event does not update this view. This is helpful when trying to compare the state or a resource at two different actions. 
- Save — saves the captured resources to disk. 
- Red, Green, and Blue — toggles on and off specific colors. 
- Alpha — enables alpha visualization. In the neighboring drop-down, you can select one of the following two options: - Blend — blends the alpha with a checkerboard background. 
- Grayscale — alpha values are displayed as grayscale. 
 
- Flip Image — inverts the image of the resource displayed. 
Below the toolbar is a set of buttons for high-level filtering of the resources based on type. Next to that, there is a drop-down menu that allows you to select how you wish to view the resources: thumbnails, small thumbnails, tiles, or details.
 
If you select the Details view, you can sort the resources by the available column headings (type, name, size, etc.).
Filtering
There are three ways to filter the available resources.
- For high-level filtering, there are color coded buttons to filter based on resource type. All resource types are visible by default, and you can filter the resource list by de-selecting the button for the type you don’t want to see. For example, if you’d like to see only textures, you can click the other buttons to de-select them and remove them from the list of resources. 
 
- You can manually type in a search string to filter the list of resources. 
- You can choose from the drop-down of predefined filters to view only large resources, depth resources, unused resources, or resources that change in the frame. Selecting one of these fills in the JavaScript string necessary for the requested filter, which is also useful as a basis to construct custom filters. 
 
Application HUD
The Application HUD is a heads-up display which overlays directly on your application. You can use the HUD to capture a frame and subsequently scrub through its constituent draw calls on either the HUD or an attached host.
All actions that occur either in the HUD or on the host — such as capturing a frame or scrubbing to a specific draw call — are automatically synchronized between the HUD and the host, and thus you can switch between using the HUD and host UI seamlessly as needed.
The HUD has two (2) modes:
- Running: Interact with your game or application normally, while the HUD shows an FPS counter. When you first start your application with Nsight Graphics, the HUD is in Running mode. This mode is most useful for viewing coarse GPU frame time in real-time while you run your application. 
- Frame Debugger: Once you have captured a frame, you can debug the frame directly in the Nsight Graphics HUD (as well as from the host). The HUD allows you to scrub through the constituent draw calls of a frame, to view render targets with panning and zooming, and to examine specific values in those render targets. 
Running Mode
In this mode, your application can interact with the game or application normally, and the HUD shows frame-time overlaid on the scene. When you first start your application with Nsight Graphics, the HUD is in Running mode.
 
Frame Debugger Mode
There are two different methods to pause the application, which causes it to enter Frame Debugger mode.
- Press Target application capture hotkey, as mentioned above; or 
- Go to the main toolbar in the Nsight Graphics UI and select Pause and Capture Frame. 
Once you have captured a frame, you can debug the frame directly in the HUD. While you can also debug the frame on the host, the HUD allows you to scrub through the constituent draw calls of a frame, to view render targets with panning and zooming, and to examine specific values in those render targets.
 
The HUD scrubber can be clicked to navigate between events. Additionally, the view has several controls to aid in your resource investigation.
| Hot Keys | Action | 
|---|---|
| Left-click + drag on the scrubber bar | Navigate to a particular draw call in your frame. When the desired draw call is active, release the left mouse button. The geometry for the currently active draw call will be highlighted, as long as it is on screen. | 
| Home | Navigate to the first event | 
| End | Navigate to the last event | 
| CTRL + Left | Navigate to the next event | 
| CTRL + Right | Navigate to the previous event | 
| CTRL + Plus (+) | Zoom in | 
| CTRL + Minus (-) | Zoom out | 
| CTRL + Zero (0) | Makes the current texture fit to screen. | 
| Left-click + drag on a render target | Pans and zooms the currently displayed render target. Use the mouse wheel to zoom in to a particular portion of the render target. | 
| N | Cycles between the currently available render targets, depth targets, and stencil targets. | 
| W | Cycles between wireframe modes (off, red, animated). | 
To switch the display to another active render target:
- Click the Select Render Target button on the HUD toolbar. 
- A drop-down menu will appear, showing all valid choices for the current draw call. Select the desired render target. 
- Note that if a selected render target is not still active for a different draw call, the display will automatically switch to an active render target. 
- You can also toggle between available render targets using the Ctrl+N hotkey. 
| HUD ICON | DEFINITION | 
|---|---|
|   | Navigate to the next event. | 
|   | Navigate to the previous event. | 
|   | Cycles between the currently available render targets, depth targets, and stencil targets. | 
|   | Cycles between wireframe modes (off, red, animated). | 
API Inspector
The API inspector is a common view to all supported APIs that offers an exhaustive look at all of the state that is relevant to a particular event to which the capture analysis is scrubbed.
To access this view, go to Frame Debugger > API Inspector.
While the view is common, the state within it is particular to each API. See the section below that relates to your API of interest.
Search
With API Inspector pages, there is a search bar that offers a quick way of finding the information you need on a particular page. The bar will indicate the number of matches in each page, and forward and back navigation buttons are provided for navigating between each match. The buttons also support keybindings, with F3 for next and Shift+F3 for previous.
 
Expand/Collapse
Within an API Inspector page, there are many sections that can be expanded or collapsed to help narrow the information that is displayed to only the information you wish to see at that point in time. While each section can be individually collapsed, the UI has buttons that allow for expanding or collapsing all elements in one click.
 
Export
Each page has the ability to be exported to structured data in a json format. This json data will include key value pairs of the data elements, as well as indirections that indicate the relationships between different kinds of data.
This data is useful in cases where you may want to export data for persistence, or perhaps to run a diff between the data of different events.
 
The view also has the ability to export data from all pages. The information in each will be exported to a large, combined file in a structured json format. To accomplish this, use the Export All Pages to Json button.
 
D3D11 API Inspector
The API Inspector view has an API-specific pipeline navigator that allows you to select a particular group of state within the GPU pipeline. From here, you can inspect the API state for each stage, including what textures and render targets are bound, or which shaders are in use in the related constants. Note that if a stage is not active (either there is nothing bound to that stage or it doesn’t apply for the current action) it will be grayed out, but you can still click on it to inspect the state.
Pipeline Stages
The following table shows the stages that are available for inspection:
- IA — The Input Assembler shows the layout of your vertex buffers and index buffers. 
- VS — Shows all of the shader resource views and constant buffers bound to the Vertex Shader stage, as well as links to the HLSL source code and other shader information. 
- HS — This shows all of the shader resource views and constant buffers bound to the Hull Shader stage, as well as links to the HLSL source code and other shader information. 
- DS — This shows all of the shader resource views and constant buffer bound to the Domain Shader stage, as well as links to the HLSL source code and other shader information. 
- GS — Shows all of the shader resource views and constant buffers bound to the Geometry Shader stage, as well as links to the HLSL source code and other shader information. 
- SO — Shows the resources bound for Stream Output. 
- RS — Shows the Rasterizer State parameters, including culling mode, scissor and viewport rectangles, etc. 
- PS — Shows all of the shader resource views, constant buffers, and render target views bound to the Pixel Shader stage, as well as links to the HLSL source code and other shader information. 
- OM — Shows the Output Merger parameters, including blending setup, depth, stencil, render target views, etc. 
- CS — This shows all of the shader resource and unordered access views and constant buffers bound to the Compute Shader stage, as well as links to the HLSL source code and other shader information. 
Input Assembler (IA)
The Input Assembler page shows the details of your vertex buffers and index buffers, the input layout of the vertices.
 
Shaders (VS, HS, DS, GS, PS, CS)
The various shader pages display all of the constant buffers, shader resource views, and input/output parameters, as well as links to the HLSL source code and other shader information.
In the constant buffer list, you can expand the buffer to see which HLSL variables are mapped to each entry, as well as the current values.
 
To enable resolution of HLSL variables, you must enable debug info when compiling the shader. See Shader Compilation for a discussion of the parameters required to prepare your shaders for optimal usage within Nsight Graphics.
Rasterizer State (RS)
The Rasterizer State page displays parameters including culling mode, scissor and viewport rectangles, etc.
 
Output Merger (OM)
The Output Merger page shows parameters including blending setup, depth, stencil, currently bound render target views, etc.
 
D3D12 API Inspector
The API Inspector view has an API-specific pipeline navigator that allows you to select a particular group of state within the GPU pipeline. From here, you can inspect the API state for each stage, including what textures and render targets are bound, or which shaders are in use in the related constants. Note that if a stage is not active (either there is nothing bound to that stage or it doesn’t apply for the current action) it will be grayed out, but you can still click on it to inspect the state.
Pipeline Stages
The following table shows the stages that are available for inspection:
- IA — The Input Assembler shows the layout of your vertex buffers and index buffers. 
- VS — Shows all of the shader resource views and constant buffers bound to the Vertex Shader stage, as well as links to the HLSL source code and other shader information. 
- HS — This shows all of the shader resource views and constant buffers bound to the Hull Shader stage, as well as links to the HLSL source code and other shader information. 
- DS — This shows all of the shader resource views and constant buffer bound to the Domain Shader stage, as well as links to the HLSL source code and other shader information. 
- GS — Shows all of the shader resource views and constant buffers bound to the Geometry Shader stage, as well as links to the HLSL source code and other shader information. 
- SO — Shows the resources bound for Stream Output. 
- RS — Shows the Rasterizer State parameters, including culling mode, scissor and viewport rectangles, etc. 
- PS — Shows all of the shader resource views, constant buffers, and render target views bound to the Pixel Shader stage, as well as links to the HLSL source code and other shader information. 
- OM — Shows the Output Merger parameters, including blending setup, depth, stencil, render target views, etc. 
- CS — This shows all of the shader resource and unordered access views and constant buffers bound to the Compute Shader stage, as well as links to the HLSL source code and other shader information. 
Input Assembler (IA)
The Input Assembler page shows the layout of your vertex buffers and index buffers, as well as the vertex declaration information.
 
Shaders (VS, HS, DS, GS, PS, CS)
The various shader pages display all of the constant buffers, shader resource views, and input/output parameters, as well as links to the HLSL source code and other shader information.
In the constant buffer list, you can expand the buffer to see which HLSL variables are mapped to each entry, as well as the current values.
 
To enable resolution of HLSL variables, you must enable debug info when compiling the shader. See Shader Compilation for a discussion of the parameters required to prepare your shaders for optimal usage within Nsight Graphics.
Rasterizer State (RS)
The Rasterizer page displays render state settings, texture wrapping modes, and viewport information.
 
 
Output Merger (OM)
The Output Merger page displays parameters such as blending setup, depth, and stencil states.
 
Device
The Device page displays details about the architecture that was used.
 
Present
The Present page displays information about back buffers that were used.
 
OpenGL API Inspector
When using the Frame Debugger feature of Nsight Graphics, you may wish to do a deep dive into the specific draw calls in order to analyze your application further. There are three different categories of API Inspector navigation.
 
Pipeline Stages
The first category is laid out like a “virtual GPU pipeline.” This pipeline section of the API Inspector consists of the following:
- Vtx Spec (Vertex Specification) — State information associated with your vertex attributes, vertex array object state, element array buffer, and draw indirect buffer. 
- VS (Vertex Shader) — Vertex shader state, including attributes, samplers, uniforms, etc. 
- TCS (Tessellation Control Shader) — Tessellation control shader state, including attributes, samplers, uniforms, control state, etc. 
- TES (Tessellation Evaluation Shader) — Tessellation evaluation shader state, including attributes, samplers, uniforms, evaluation state, etc. 
- GS (Geometry Shader) — Geometry shader state, including attributes, samplers, uniforms, geometry state, etc. 
- XFB (Transform Feedback) — Transform feedback state, including object state and bound buffers. 
- Raster (Rasterizer) — Rasterizer state, including point, line, and polygon state, culling state, multisampling state, etc. 
- FS (Fragment Shader) — Fragment shader state, including attributes, samplers, uniforms, etc. 
- Pix Ops (Pixel Operations) — State information for pixel operations, including blend settings, depth and stencil state, etc. 
- FB (Framebuffer) — State of the currently drawn framebuffer, including the default framebuffer, read buffer, draw buffer, etc. 
Object and Pixel State Inspectors
The object and pixel state inspectors section of the API Inspector consists of the following:
- Textures — Details about all of the currently bound textures and samplers, including texture and sampler parameters. 
- Images — Details about all of the images currently bound to the image units. 
- Buffers — Details about all of the bound buffer objects, including size, usage, etc. 
- Program — Information about the currently bound program object and/or pipeline program pipeline object, including shaders, active uniforms, etc. 
- Pixels — Current settings for pixel pack and unpack state. 
Miscellaneous
The miscellaneous screen contains additional information such as shader limits, implementation dependent values, transform feedback limits, and various minimum/maximum values.
Vulkan API Inspector
The API Inspector view has an API-specific pipeline navigator that allows you to select a particular group of state within the GPU pipeline. From here, you can inspect the API state for each stage, including what textures and render targets are bound, or which shaders are in use in the related constants. Note that if a stage is not active (either there is nothing bound to that stage or it doesn’t apply for the current action) it will be grayed out, but you can still click on it to inspect the state.
Pipeline Stages
The following table shows the stages that are available for inspection:
- Pipeline — Shows information about the currently bound pipeline object. 
- Render Pass — Shows information about the current render pass object. 
- FBO — Shows information related to the Frame Buffer Object that is associated with the current render pass. 
- IA — The Input Assembler shows the layout of your vertex buffers and index buffers. 
- Viewport — Shows the current viewport and scissor information. 
- VS — Shows all of the shader resource views and constant buffers bound to the Vertex Shader stage. 
- TCS — Shows all of the shader resources associated with the Tessellation Control Shader stage. 
- TES — Shows all of the shader resources associated with the Tessellation Evaluation Shader stage. 
- GS — Shows all of the shader resource views and constant buffers bound to the Geometry Shader stage. 
- SO — Shows the resources bound for Stream Output. 
- Raster — Shows the Rasterizer State parameters, including culling mode, scissor and viewport rectangles, etc. 
- FS — Shows all of the shader resources associated with the Fragment Shader stage. 
- Pix Ops — Shows the Pixel Operations parameters, including depth/stencil, multi-sample, and blending states. 
- Compute — This shows all of the shader resource and unordered access views and constant buffers bound to the Compute Shader stage. 
- Misc - Shows miscellaneous information associated with the instance, physical devices, and logical devices. 
Pipeline
The Pipeline page shows information about the currently bound pipeline object including: create info, pipeline layout, and push constant ranges.
 
Render Pass
The Render Pass page shows information about the current render pass including: clear values, attachments operations, and sub-pass dependencies.
 
Input Assembler (IA)
The Input Assembler page shows the layout of your vertex buffers and index buffers, as well as the vertex bindings and attribute information.
 
Shaders (VS, TCS, TES, GS, FS, CS)
The various shader pages display all of the shader modules, including: creation information, human readable SPIR-V source, current push constants, current bound descriptor sets, associated buffers, associated images and samples, and associated texel buffer views for this stage.
 
Raster
The Raster page shows all rasterization information associated with pipeline object include: polygons modes, cull modes, depth bias, and line widths.
 
Pixel Operations (Pix Ops)
The Pixel Operations page displays information associated with the current pixel state including: depth/stencil state, multi-sample state, and blending state.
 
Miscellaneous Information (Misc)
The Miscellaneous Information page shows information related to the instance, physical device(s), logical device(s), and queue(s)
 
API Statistics View
The API Statistics View is a high-level view of important API calls, and includes information to help you see where GPU and CPU time is spent.
To access this view, go to Frame Debugger > API Statistics.
 
Batch Histogram View
The Batch Histogram view provides an intuitive way for the user to inspect the primitive’s distribution across the draws. The draws can be configurably divided into buckets and allow for disabling or enabling. This can be useful for the user who want to know which draw is heavy and how it affects the render target.
To access this view, go to Frame Debugger > Batch Histogram.
Batch Histogram will display a histogram chart that contains divided buckets and can be configured by a few options.
Click Configure to open the configuration pane:
- Bucketing Mode — Determines how to divide the draws into buckets. - Logarithmic: Bucketing by logarithmic scale. 
- Linear: Bucketing by linear scale. 
 
- Bucketing Min/Max — Specifies the range that works for bucketing, the out-of-range events will be put into bookend buckets. 
- Bucket Count — Specifies the bucket count. 
Click Buckets then the corresponding events show in the table view. You can disable or enable the events by clicking Disable All or Enable All, also can be achieved by the check-box or right-click on the table items. Besides, the linkers on the table are directed to corresponding event in Events List.
 
Current Target View
The Current Target view is used to show the currently bound output targets. This can be useful because it focuses in on the bound output resources, rather than having to search for them in the All Resources view.
To access this view, go to Frame Debugger > Current Target.
Current Target displays thumbnails along the left pane for all currently bound color, depth, and stencil targets. This view changes as you scrub from event to event. All of the thumbnails on the left can be selected to show a larger image on the right. You can also click the link below each to open the target in the Resource Viewer.
 
Event Viewer
The Events view shows all API calls in a captured frame. It also displays both CPU and GPU activity, as a measurement of how much each call “costs.”
To access this view, go to Frame Debugger > Events.
 
To add context to each API call, the thread ID and object/context that made that call are offered. Nsight Graphics also supports application-generated object and thread names in these columns; see Naming Objects and Threads for guidance on the supported methods for setting these names.
Clicking a hyperlink in the Events column will bring you to the API Inspector page for that draw call.
You can select whether to view the events in a hierarchical or flat view. If multiple performance marker types are used, you can select the correct one, as well as varying levels of verbosity for the call display (variable + value, value, or none). You can also sort the events by clicking on any of the available column headers.
The visibility of columns can be toggled by right-clicking on the table’s header. By default some columns will be hidden if they offer no unique data (e.g., single thread) for the captured frame.
| Column | Description | 
|---|---|
| (Indicator Column) | Points to the currently scrubbed event | 
| (Bookmarks column) | When bookmarks are used, this column will have a visual indication of whether the row in question is bookmarked. When no bookmarks are used, this column is hidden. | 
| Event | The event ID of the API call, operation, or comment for this event. On the left-hand-side, when perfmarkers are used, a perfmarker stack is visually indicated and the ID indicates the range of events contained within this hierarchy level. Hovering over this cell will show the full perfmarker hierarchy. When the event in question is an action, the right-hand-side will contain a link to open the event in the API Inspector. | 
| Description | Describes the API call, operation, or comment. This column hierarchically lists the event hierarchy as controlled by the usage of Marker APIs. | 
| Object | The API object (context, queue, resource, etc.) for which the event was operated on or by. | 
| CPU ms | CPU timing for the event in question. When this is a perfmarker range, this indicates an aggregate summation. These timings are provided for reference, as they are not fully accurate due to the impact of capture operations on their timing. Note that tracing tools like Nsight Systems™ are targeted for higher resolution timing. | 
| GPU ms | GPU timing for the event in question. Not all events will have GPU timings. For perfmarker ranges, a delta time is indicated, noting that this is an approximated aggregation of the timing of the events within it, not an absolute timing. | 
| Thread | The thread that performed the API call, operation, or comment. | 
| Tag | A list of meta-data tags that relate to the call in question. These typically indicate special characteristics of the call that are not evident in their API name or parameters. When no columns have tags, this column is hidden. | 
| Issue | For APIs that support issue tracking (OpenGL), this column indicates whether a particular API call or operation has issues/warnings of interest. | 
Filtering Events
The events view can be filtered with both a quick filtering expression as well as a detailed configuration dialog.
The filter input box offers a quick, regex-based match against events to find events of interest. Once entered, the view is automatically updated to match against the specified filter.
The Configure button brings up a dialog for more advanced, as well as persistent, filtering of the events in the view.
Changes within this dialog take immediate effect. There are three major classes of filters:
- Event Type Filters — These filters allow filtering to happen on the classification of the event. For example, you may want to hide all non-action events to quickly filter to just draws, clears, and other actions. 
- Other Filters — These filters allow matching against events with a particular characteristics. Additionally, the Advanced Filters tab allows for JavaScript-based filtering on particular columns. 
- Method Filters — These filters hide methods that match against method names or object types. To add method filters, right-click on an event that you wish to hide and select one of the hiding capabilities within the view. 
Filter Persistence
Filters set by the filter configuration dialog will persist from session to session. Additionally, if multiple filter configurations are desired, you may save different named versions and recall them quickly by name.
Filters entered into the main filter-input box are not persisted, as these filters are meant for quick filtering of the event data.
Regex Syntax
For entries that support regex syntax, the syntax is implemented with a Perl-compatible regular expression language. Here are some examples of common tasks and the expressions that achieve them:
| Task | Expression | 
|---|---|
| Search for a draw call | 
 or use the predefined filter) | 
| Match OpenGL binding calls | 
 | 
| Match D3D AddRef or Release calls | 
 | 
| Search for D3D methods that set constant buffers | 
 | 
JavaScript Syntax
The Advanced Filters configuration dialog supports JavaScript syntax. This enables complex evaluation of filtering expressions. The basic approach for JavaScript expressions is to match a particular column of data against an expression. Columns are “accessed” via a $('ColumnName') expression. For example, a column titled “Description” is accessed via $('Description'). From there, you can perform mathematical, logical, and text-matching expressions. See some examples below to demonstrate the power and usage of these expressions:
| Task | Expression | 
|---|---|
| Match against the description column for draw | 
 | 
| Find events with non-zero GPU time | 
 | 
| Find odd events | 
 | 
| Find non-draw events with non-zero GPU time | 
 | 
Bookmarking
While filtering, it is often desired to keep the context of certain items while you find others. To prevent an event from being filtered, right-click the event and select Toggle Bookmark.
Alternatively, you can double-click or use Ctrl + B to bookmark the currently selected event. To navigate between bookmarks, use Alt + Up and Alt + Down.
If you wish to see the filtered results on the scrubber, you can select the tag button to the right of the filter toolbar, and a new row will appear in the Scrubber that displays your filtered events, allowing you to navigate those events in isolation.
 
 
Perf Markers
On the Events page, you can use the hierarchical view to see a tree view of performance markers. The items listed in the drop-downs correspond with the nested child perf markers on the Scrubber.
If you use the flat view on the Events page, the perf marker is not nested, but you can hover your mouse over the color-coded field in the far left column, which allows you to view the details about that perf marker.
When an application uses multiple kinds of perf markers, the Marker API allows selecting the API to use for the display. This situation may arise if the application uses a middleware, for example, or mixes components with different marker strategies.
 
Hotkeys
| Navigation | |
|---|---|
| 
 | Go to the first event in the list. | 
| 
 | Go the last event in the list. | 
| 
 | Go to the previous action. | 
| 
 | Go to the next action. | 
| 
 | Collapse the current perfmarker level. | 
| 
 | Expand the current perfmarker level. | 
| 
 | Go to the next bookmark. | 
| 
 | Go to the previous bookmark. | 
| 
 | Go to the next perfmarker on the same level of the perfmarker stack. | 
| 
 | Go to the previous perfmarker on the same level of the perfmarker stack. | 
| 
 | Go to the next perfmarker that is one level (or greater) up the stack for the current selection. | 
| 
 | Go to the previous perfmarker that is one level up the stack for the current selection. | 
| 
 | Go down the perfmarker stack to the next perfmarker that is one level below the current selection. | 
| 
 | Go down the perfmarker stack to the previous perfmarker that is one level below the current selection. | 
| 
 | Focus the current event to the center of the view. | 
| 
 | Focus the current event to the top of the view. | 
| 
 | Focus the current event to the bottom of the view. | 
| 
 | Focus the current selection by collapsing all other perfmarkers. If the current selection is a perfmarker itself, it is expanded. | 
| 
 | Clear selection. | 
| 
 | Select all events. | 
| 
 | Copy selected events. | 
Event Details
The Event Details view shows all parameters for the current event in a hierarchical tree structure that allows for searching.
To access this view, go to Frame Debugger > Event Details.
 
Because this window shows parameters for the current event, it changes as you navigate the scene. If you wish to keep the parameters for comparison against another call, the view supports Clone and Lock capabilities.
| Column | Description | 
|---|---|
| Name | The name of the parameter or child of a parameter structure. | 
| Value | The value passed in to the API call for this parameter. | 
| Type | The type of the parameter in question. | 
For events that reference API objects, the Event Details view provides a link to examine more information to that object in the Object Browser.
 
Geometry View
The Geometry view takes the state of the Direct3D, OpenGL, or Vulkan machine, along with the parameters for the current draw call, and shows pre-transformed geometry.
To access this view, go to Frame Debugger > Geometry.
There are two views into this data: a graphical view and a memory view.
Graphical Tab
 
Mouse Events
- Hover — Hint the elements of the hovered primitive. 
- Left Click — Select the primitive or reset the selection if clicking at nothing. When selecting in the graphical viewer, the correlated rows in the memory table are also selected at the same time. 
Attribute Options
- Position — Specifies the vertex attribute to use for positional geometry data. 
- Color — Specifies how to color the geometry. If Diffuse Color is selected, the selected diffuse color swatch will be used for coloring. If a vertex attribute is selected, the selected attribute will be used for per-vertex coloring.   
- Normal — Specifies the per-vertex normal. This selection applies when using a shade mode that specifies Normal Attribute or when rendering normal vectors. 
Rendering Options
Clicking Configure in the bottom right corner of the Geometry View will open up the rendering options menu.
 
- Reset Camera — Resets the camera to its default orientation. By default, the viewer bounds all geometry with a bounding sphere for optimal orientation. 
- Zoom To Selected — Zoom the camera to the selected primitive. 
- Render Mode — Determines how to render and raster geometry. - Solid: renders filled geometry. 
- Points: renders a vertex point cloud.   
- Wireframe: renders a wireframe of the geometry. 
- Wireframe + Solid: renders filled geometry with a wireframe on top of it. 
 
- Shade Mode — Specifies the lighting mode of the rendered image. - Selected Color Attribute: Shades with the specified color attribute 
- Flat Shading Using Generated Normals: Renders the geometry using flat shading with calculated normals 
- Flat Sharing Using Normal Attribute: Renders the geometry using flat shading with the specified Normal Attribute. 
- Smooth Shading Using Normal Attribute: Renders the geometry using smooth shading with the specified Normal Attribute. 
 
- Render Normal Vectors — Renders the specified normal attribute as a vector pointing from each vertex. The vector may be colored by the Normal Color selection and may be scaled by the Normal Scale selection.   
Memory Tab
The Memory tab of the Geometry View shows the contents of the vertex buffer, as interpreted by the current vertex or input attribute specification. This view is useful for seeing the raw data of your draw call. An additional capability of this view is that it highlights invalid or corrupt vertices to streamline finding problematic data. Another useful feature is that the selection linkage to the graphical viewer, where selecting a memory row also selects the associated primitive.
There are two modes of display for the geometry data:
- Index Buffer Order shows the vertices as indexed by the current index buffer and current draw call.   
- Vertex Buffer Order shows the vertices as linearly laid out from the start of the vertex buffer and draw call specification.   
Object Browser
The Object Browser view provides a list of all objects tracked for your frame, listed by name and by type. Beneath each object is a list of the properties and other metadata that Nsight Graphics tracks. This view is useful for finding objects that utilize a particular kind of property, for example a memory buffer with a particular flag.
To access this view, go to Frame Debugger > Object Browser. This view is also a destination for links provided by the Event Details and Event Viewer views.
 
This view supports Clone capabilities. Note, however, that this view captures fixed properties and metadata for each object at the end of frame. For APIs with mutable object properties, such as OpenGL, those properties are not updated in coordination with scrubbing. As such, Lock capabilities are not applicable to this view.
This view provides two panes side-by-side. The left-hand Objects pane provides the object list as well as their properties; the right-hand pane is context-sensitive and provides additional information about the object that is selected on the left-hand side.
Objects Pane
The objects pane (left-hand side) provides several capabilities for filtering objects:
- Object names and properties can be filtered via the filter box. 
- Object types can be filtered via the types combo box (default is All Types). 
- Objects can be filtered as to whether they report usages (default is Any # of Usages). 
- The Filter Properties checkbox determines whether the properties of an object are considered when filtering items in the object tree view. 
Object Name
This section shows the name of the selected object.
When the selected object has a specific viewer for viewing additional information about that type, a link to that specific viewer is provided. For example, texture resources provide a link to opening the selected texture in the Resource Viewer.
Object Usage
This section lists a table for the events in which an object is used. Each event is tagged to indicate the Usage of that object (READ or WRITE).
Related Objects
Many API objects reference other objects. This section lists those objects, their type and relationship, as well as a link to more information on that related object.
Resource Viewer
The Resource Viewer allows you to see all of the available resources in the scene. This view is brought up by clicking resource links in any frame debugging view.
The Resource Viewer is opened through links from resource thumbnails or entries in many Nsight views, for example from the API Inspector or All Resources View.
Once opened, there are two tabs available:
- Graphical 
- Memory 
Graphical Tab
The Graphical tab allows you to inspect the resource, pan using the left mouse button to click and drag, zoom using the mouse wheel, and inspect pixel values. Also, this is where you can save the resource to disk. If supported on your GPU and API, this is also where you can initiate a Pixel History session to get all of the contributing fragments for a given pixel.
 
When you have selected a buffer from the left pane, the Show Histogram button is available on the right side of the Graphical tab, which allows for remapping the color channels for the resource being viewed.
 
To modify the histogram view, the following options are available:
 
- You can set the minimum and maximum cutoff values via the sliders under the histograms, or by typing in values in the Minimum and Maximum boxes. 
- You can change the scale by using the Log button. 
- The Luminance button allows you to visualize luminance instead of color values. 
- The Normalize button can preset the minimum and maximum values to the extents of the data in the resource. 
Memory Tab
The Memory tab shows a dump of the resource data.
 
You can use multiple options to configure how this memory is displayed:
- The Axis drop-down changes between address (memory offset) and index (array element) views. 
- The Offset entry limits the view to an offset within the given resource. 
- The Extent entry limits the view to a maximum extent within the given resource. 
- The Precision spin box controls the number of decimal places to show for floating point entries. 
- The Hex Display toggles between decimal (base-10) and hex (base-8) display formats. 
- Hash shows a hash value representative of the given memory resource within the current offset and extent. This is useful for comparing memory objects or sub-regions. 
- The Transpose button swaps the rows and columns of the data representation. 
- The Configure button opens the Structured Memory Configuration dialog. 
Additional Capabilities
There are a number of additional capabilities in this view. At the top of the viewer, you’ll find a toolbar:
 
- Clone — makes a copy of the current view, so that you can open another instance. 
- Lock — freezes the current view so that changing the current event does not update this view. This is helpful when trying to compare the state or a resource at two different actions. 
- Save — saves the captured resources to disk. 
- Red, Green, and Blue — toggles on and off specific colors. 
- Alpha — enables alpha visualization. In the neighboring drop-down, you can select one of the following two options: - Blend — blends the alpha with a checkerboard background. 
- Grayscale — alpha values are displayed as grayscale. 
 
- Flip Image — inverts the image of the resource displayed. 
Pixel History
Pixel history enables the automatic detection of the draw, clear, and data-update events that contributed to the change in a pixel’s value. In addition, pixel history can identify the fragments that failed to modify a particular texture target, allowing you to understand why a draw might be failing, such as whether you may have misconfigured API state in setting up your pipeline.
To run a pixel history test, click the  button and select a pixel to run the experiment on. The Pixel History view comes up with a loading bar and presents the results once they are complete.
 button and select a pixel to run the experiment on. The Pixel History view comes up with a loading bar and presents the results once they are complete.
 
Structured Memory Configuration
The Structured Memory Configuration dialog allows the user to specify a data layout to interpret the raw data backing the selected resource. For example, a texture may be represented by its color channels or a uniform buffer may be represented by the various types packed within that buffer.
 
Typing in a valid structure definition automatically updates the viewer to respect the configuration.
 
New columns can be created using a simple C-like syntax.
int;      // creates a column with an anonymous int
int x;    // creates a second column with an int named x
float y;  // creates a third column with a float named y
Where additional user types can be defined like the following:
struct MyType{ int x; float y;};
struct MyOtherType{ MyType z; double u; };
Many common sized, unsized, and normalized types are permitted as valid types. Vector and matrix types are provided in a similar syntax to HLSL and GLSL. The full list of supported types can be browsed and searched by clicking on the expandable “Defined Types” sub-section of the configuration dialog.
 
As some additional notes on the parser:
- Full C/C++ grammar is not supported. 
- Single line comments are accepted; C-style block comments ( - /* */) are not.
- Macros are not currently supported. 
- Alignments are not considered; all types are considered packed. 
- To add explicit padding, use - padNwhere N is a multiple of 8.
- Members can be selectively hidden as well, which can be useful for narrowing your data. 
When clicking on a texture resource, the configuration is automatically populated to interpret the channels of that format.
 
Similarly, buffers are defaulted to a generic byte configuration. You can typically interpret this buffer data by examining the specific use case. For example, the layout of a vertex buffer can be seen in the Input Assembler section of the API Inspector view, or a uniform buffer can be interpreted by looking at the data layout specified within the shader source.
To persist a configuration, you can click on the Save… button to assign a name to this configuration.
 
Later, you can restore this configuration by clicking on the Load… button.
 
Scrubber
The Nsight Graphics Frame Debugger has two parts. One part appears as the Frame Debugger window on the host. The other part appears as a Heads-Up Display (HUD) on the target application.
To access this view, go to Frame Debugger > Scrubber.
The part of the Frame Debugger that appears as a HUD on the target machine is comprised of the following:
- HUD Toolbar — controls the frame capture, along with a number of other options (help, etc.). 
- Frame Scrubber — indicates the current draw event. There is a Scrubber view in the Frame Debugger on the host, as well as a frame scrubber on the HUD. The frame scrubber controls stay in synch with each other, meaning that when you move the controls on one, it affects the other. For example, if you move the frame scrubber on the HUD to highlight a new draw event, the scrubber on the Frame Debugger moves in sync to do likewise. 
 
Understanding the Frame Scrubber
For the sake of discussion when it comes to graphics debugging, it helps to note some common terminology.
- An event is a single call to the API. It could be a triangle draw call, or backbuffer clear, or a less obvious call, like configuring buffers. A snapshot is a sequence of events. 
- An action is a subset of the event types. It can be one of the following: (1) Draw Call, (2) Clear, or (3) Dispatch. Actions are interesting since they explicitly change data which may result in visual changes. - Note - For Direct3D frame debugging: The Direct3D runtime documentation states that, “the return values of AddRef & Release may be unstable and should not be relied upon.” The Nsight Graphics Frame Debugger also takes additional references on objects so any code that relies on an exact reference count at a particular time may fail. In general, users should not expect an exact reference count to be returned from the Direct3D runtime. For more information, see Microsoft’s Rules for Managing Reference Counts. 
When you debug your graphics project, the Scrubber window shows the perf markers you implemented. When working with user-defined markers, the Scrubber window uses the color and label that you defined for the perf marker.
 
On the Scrubber, you can select one performance marker and it automatically creates a range of all the draw calls that occurred within that time frame. Clicking on it again causes the Scrubber to automatically zoom to that range of events. You can zoom in on a nested/child marker the same way.
To zoom out, click the parent performance marker, or use CTRL + mouse wheel.
Performance markers are also displayed on the HUD, color-coded the same way that they are on the Scrubber. However, on the HUD, the information is condensed, and you must hover your mouse over the selected performance marker to get its details.
 
The default view shows the events in your application, in addition to any performance markers you have defined. Clicking the Add… button opens a dialog that allows you to select what type of range you want to add.
- Program Ranges — Actions that use the same shader program. 
- Viewport — Actions that render to the same viewport rectangle. 
- Alpha Blending Enabled — Actions that have alpha blending enabled. 
- Alpha Test Enabled — Actions that have alpha test enabled. 
- Back Face Cull Enabled — Actions that have back face cull enabled. 
- User — A range defined by you on the fly. Use SHIFT + left-click and drag the Scrubber on the created “User” row to create a new range. 
Right-clicking on a specific action in the Scrubber allows you to open the API Inspector for that action, change your view settings, or initiate a shader profile.
Scrubber View Options
 
From the Mode drop-down menu, choose one of the following:
- Unit Scale is the default view, which simply shows the actions and events on the timeline. 
- GPU Time Scale displays the GPU activity and how much each event or action cost the GPU. 
- CPU Time Scale displays the CPU activity and how much each event or action cost the CPU. 
- X by CPU, Y by GPU displays the CPU time scale on a horizontal X-axis, and the GPU time scale on a vertical Y-axis. 
Depending on which mode you select, you can also select whether you want to view the ruler relative to the capture, viewport, or cursor.
 
From the Hierarchy drop-down, Queue Centric sorts the events by queue, while Thread Centric sorts the events by the thread.
 
Using Hotkeys to Scrub Through a Frame
When the Scrubber has focus, you can use the following hotkeys to move the Scrubber cursor from one event to another.
| Navigation | |
|---|---|
| 
 | Go to the first event. | 
| 
 | Go the last event. | 
| 
 | Go to the previous event. | 
| 
 | Go to the next event. | 
| 
 | Expand the current event group (HUD only). | 
| 
 | Collapse the current event group (HUD only). | 
| 
 | Current event: show less information (HUD only). | 
| 
 | Current event: show more information (HUD only). | 
| Zooming and Panning | |
| 
 
 | Zoom in X-axis | 
| 
 
 | Zoom out X-axis | 
| 
 | Reset zoom | 
| 
 | Increase row height (all rows) | 
| 
 | Decrease row height (all rows) | 
| 
 | Pan | 
| 
 | View zoom window | 
| Cursor and Selection | |
| 
 | Set cursor (Places cursor at closest point to the start of a range.) | 
| 
 | Select row (The selected row is highlighted in orange.) | 
| 
 | Make range selection | 
| 
 | Zoom to range | 
| 
 
 | Open API Inspector | 
| CTRL + A | Select all events | 
For the purpose of moving the Scrubber cursor, the following are considered action events:
- Draw methods 
- Clear methods 
- Dispatch methods 
- Present methods 
For example, if you are looking for the next draw method that was called, you can press the CTRL + RIGHT ARROW on the keyboard to skip over events that are not typically of interest, and only stop on events that are considered action events.
Shader Profiling
The Shader Profiler is a tool for analyzing the performance of SM-limited workloads. It helps you, as a developer, identify the reasons that your shader is stalling and thus lowering performance. With the data that the shader profiler provides, you can investigate, at both a high- and low-level, how to get more performance out of your shaders. See the Shader Profiler section for more information.
Linked Programs View
The Linked Programs View lists all of the shaders in your application.
To access this view, go to Frame Debugger > Linked Programs.
- If the shader (or its parent program or pipeline object) hasn’t been used by the application yet, it shows up with the  symbol in the Status column. symbol in the Status column.
- If the shader has been used, selected statistics are presented for that shader. 
For programs or pipeline objects, you can view the individual shaders by pressing the ► button to the left of the program/pipeline name. When expanded, you can select the link to open a text view of the shader source (when available).
 
Name
This is the name of the shader. This name is either generated internally, or can be assigned by the user per API.
Type
The type of the shader: Vertex, Pixel, Compute, etc.
UID
Indicates a unique object ID for the associated pipeline or shader.
Context
Indicates to which of the application’s contexts this shader is owned. Shown on multi-context OpenGL applications, only.
Status
This column displays the current status of the shader. The status includes Source or Binary, to denote whether or not source code is available for this shader. Also, if the SASS text is included, this means that we have driver level binary code that is necessary for gathering shader performance metrics.
The
symbol means that we are waiting for the shader to be bound by the application.
The
symbol means that shader performance metrics are currently being computed.
Debug Info
This column indicates that availability and load status of debug information for the associated shader.
File Name
Lists the file name from which the shader was compiled for those shaders with debug info that provides this information.
App Hash
This column displays a unique hash of application level shader information (e.g., bytecode).
# Reg
This column gives the number of registers used by the program. Register count impacts occupancy/threads in flight. This may be not available for all shaders.
The following columns are only valid for compute shaders
CTA Dim
Indicates the CTA dimensions used by the shader.
Smem
Indicates the shared memory allocated for the shader per warp/thread.
# Barrier
Indicates the number of barriers used by the shader.
Ray Tracing Inspector
The Ray Tracing Inspector shows the geometry that has been specified in build commands when running an application that uses ray tracing APIs. If the application does not use these APIs, the view is not available.
In Ray tracing APIs, such as DXR and NVIDIA Vulkan Ray Tracing, an acceleration structure is a data structure that describes the full-scene geometry that is traced when performing the ray tracing operation. This data structure is described in detail in the following links: https://developer.nvidia.com/rtx/raytracing/dxr/DX12-Raytracing-tutorial-Part-1 and https://developer.nvidia.com/rtx/raytracing/vkray.
This data structure is purpose-built to allow for translation to application-specific data structures that perform well on modern GPUs. While constructing this data structure, the developer has the responsibility of constructing the structure correctly and using flags to identify the functional and performance characteristics within it. Needless to say, this can be an error-prone operation.
Nsight Graphics Ray Tracing Inspector allows you to view the structures you are creating, navigate through them, and see the flags that you are using. Additionally, you can filter and colorize the structure to highlight, at a bird’s eye view, different kinds of geometry.
The Ray Tracing Inspector is opened through links from resource thumbnails or entries in many Nsight Graphics views, such as the API Inspector or All Resources View. For example, the Ray Tracing Inspector can be opened from the API Inspector View when scrubbed to a build event trace rays call. When scrubbed to these events, the view presents a list of the active structures with a link to open each.
The view is multi-paned — it shows a hierarchical view of the acceleration structure on the left, a graphical view of the structure in the middle, and controls and options on the right. Additionally, a performance analysis section is present on the lower-left. With the hierarchy of the Acceleration Structure tree view, the top-level acceleration structure (TLAS), bottom-level acceleration structures (BLAS), child instances, child geometries, and memory sizes are presented. When a particular item is selected, the name, flags, and other meta-data for this entry are listed in a section on the bottom left-hand side. Each item within the tree has a check box that allows the rendering of the selected geometry or hierarchy to be disabled. Double-clicking on an item jumps to the item in the rendering view and automatically adjusts the camera speed to be relative to the size of the selected object.
 
| Column | Description | 
|---|---|
| Name | An identifier for each row in the hierarchy. Click on the check box next to the name to show or hide the selected geometry or hierarchy. Double-click on this entry to jump to the item in the rendering view. | 
| # Prims | The number of primitives that make up this geometry. | 
| Surface Area | A calculation of the total surface area for the AABB that bounds the particular entry. | 
| Size | A calculation of the memory usage for this particular level. Hover over this entry to see a tooltip that includes a roll-up calculation of the aggregate memory usage of this hierarchical level and children below it. | 
Performance analysis tools are accessible in the bottom left corner on the main view. These tools help identify potential performance problems that are outlined in the RTX Ray Tracing Best Practices Guide. These analysis tools aim to give a broad picture of acceleration structures that may exhibit sub-optimal performance. To find the most optimal solution, profiling and experimentation is recommended but these tools may paint a better picture as to why one structure performs poorly compared to another.
| Action | Description | 
|---|---|
| Instance Overlaps | Identifies instance AABBs that overlap with other instances. Consider merging BLASes when instance world-space AABBs overlap significantly to potentially increase performance. | 
| Instance Heatmap | Enters a heatmap mode that shows the approximate number of instances AABBs that are hit by a ray cast from each pixel within the current viewport. This mode offers a convenient way to scan your scene for potentially problematic geometries. | 
Filtering and Highlight
The acceleration structure tree view supports geometry filtering as well as highlighting of data matching particular characteristics. The checkboxes next to each geometry allow individual toggling between full rendering, wireframe rendering, and no rendering. Combining this capability with search allows for you to identify the geometry of interest (by name when the application has named its resources) and display just that geometry.
Geometry instances can also be selected by clicking on them in the main graphical view. Additionally, right-clicking in the main graphical view gives options to hide/show all geometry, hide the selected geometry, or hide all but the selected geometry.
 
Beyond filtering, the view also supports highlight-based identification of geometry specified with particular flags. Checking each Highlight option identifies those resources matching that flag, colorizing for easy identification. Clicking an entry in this section dims all geometry that does not meet the filter criteria allowing items that do match the filter to standout. Selecting multiple filters requires the passing geometry to meet all selected filters (e.g., AND logic). Additionally, the heading text is updated to reflect the number of items that meet this filter criteria.
 
Rendering Options
Under the highlight controls, additional rendering options are available. These include methods to control the geometry colors and the ability to toggle the drawing of AABBs.
 
Performance Analysis
The Ray Tracing Inspector has a number of tools that support performance analysis of your accelerations structures.
 
| Navigation | |
|---|---|
| 
 | Disable the current performance analysis mode | 
| 
 | Enable Instance AABB Overlap Heatmap (world space) | 
| 
 | Enable Traversal Timing Heatmap | 
| 
 | Enable Ray-Primitive Intersection Heatmap | 
| 
 | Enable Instance AABB Overlap Table | 
| 
 | Alternate between the current and previous performance analysis mode | 
Instance AABB Overlap Heatmap (world space)
This tool helps identify world-space AABB overlaps with the goal of helping a developer reduce expensive overlap.
 
This heatmap overlay shows the maximum number of instance AABBs that overlap in world space for any direction within the current viewport. The more instances hit, the hotter the color. The heatmap threshold can be controlled by “Heatmap Overlay Options -> AABB Threshold” and depends on the number of instance AABBs overlapping.
When world-space AABBs of instances overlap, unnecessary BLAS traversal maybe required. If these overlaps are caused by empty space within a BLAS, consider splitting the BLAS such that this empty space is minimized. If these overlaps are caused by non-opaque geometries, minimize the area not marked as opaque to increase performance.
Traversal Timing Heatmap
This tool provides a heatmap of ray traversal time with the goal of identifying expensive geometry.
 
This heatmap overlay shows the number of GPU clock cycles required to trace a ray from the center of each pixel in the viewport to the closest geometry hit. The heatmap threshold can be controlled by “Heatmap Overlap Options -> Timing Threshold” and depends on the traversal speed of the current acceleration structure.
To determine how certain geometry affects the ray traversal speed, geometry can be hidden in which case it is ignored by this algorithm. Additionally, setting “Heatmap Overlay Options -> Opacity Override” to “Force Opaque” traverses the acceleration structure as if all geometry is opaque.
Ray-Primitive Intersection Heatmap
This tool provides a heatmap of ray geometry intersection with the goal of identify expensive geometry.
 
This heatmap overlay shows the number of surfaces hit by a ray traced into the scene from the center of each pixel. The more surfaces hit, the hotter the color. In this mode, opaque geometry terminates a ray. The heatmap threshold can be controlled by “Heatmap Overlay Options -> Intersection Threshold.” When geometry is not marked as opaque, a ray must be traced through all surfaces of that geometry until an opaque surface is hit. To minimize intersection tests, use the opaque geometry flag wherever possible, or use opacity micromaps the mask off opaque sections of non-opaque geometry.
Instance AABB Overlap Table
This table provides information on overlap to allow you to consider merging BLASes when instance world-space AABBs overlap significantly.
 
When world-space AABBs of instances overlap, the TLAS becomes non-optimal. A ray can then hit more than one instance in a volume in space. Traversing through BLASes of all those instances is then required to resolve the closest hit. Traversing through one merged BLAS would be more efficient. Tracing performance against a BLAS doesn’t depend on the number of geometries in it. Geometries merged into a single BLAS can still have unique materials.
Export
Exporting the view, by clicking on the Save (disk) icon in the upper left of the view toolbar, allows for persisting the data you have collected beyond the immediate analysis session. This capability is particularly valuable for comparing different revisions of your geometry or sharing with others. Bookmarks are persisted as well. An example use case is identifying sub-optimal geometry, bookmarking it, and passing this document to a level designer or artist for correction.
Shader Timing Heatmap View
The Shader Timing Heatmap View shows shader execution time on the GPU. It displays a heatmap image where every pixel represents the time of one shader execution unit. The warmer color (more red) means the shader takes more time to execute, while the colder color (more blue) means the shader takes less time. This display allows you to easily identify hot spots and plan optimizations for these areas.
To access Shader Timing Heatmap View, go to Frame Debugger > Shader Timing Heatmap.
If the current event is a DirectX ray tracing or a Vulkan ray tracing call, it starts collecting shader times immediately and display them after everything is loaded. Otherwise it displays a note for you to select a proper event.
Note
At this time, Nsight Graphics only supports the collecting shader execution time of raygen shaders in DirectX ray tracing and Vulkan ray tracing applications.
The view contains multiple parts. It has API information, overlay controls and a shader timings table on the left, a heatmap viewer as well as its controls on the right.
 
Heatmap
The heatmap viewer allows you to select a rect using the left mouse button, pan using the middle mouse button, and zoom using the mouse wheel. The table shows timings in the selected rect if there is one.
Overlay
The overlay controls allow you to select an image to be an overlay on the heatmap.
 
When an overlay image is selected, you are able to adjust the opacity using the slider.
 
Color Mapping
On the right of the heatmap, there is a legend that shows the mapping from timing values and colors. If you want to change the mapping, click the right-most button on the heatmap and control the sliders.
 
Export
Click the Save button on the top-left side to export the view. This view supports saving the session, the heatmap image, or saving the table contents.
D3D12 Specific Views
D3D12 Descriptor Heaps
The Descriptor Heaps view displays all of the descriptor heaps bound for the current event.
To access this view, go to Frame Debugger > Descriptor Heaps.
 
On the left are the descriptor heaps available, and on the right you can view the properties of each descriptor heap. Along the top of the details pane, you can see how populated the descriptor heap is, as well as the maximum contiguous valid and invalid ranges. These properties can help you dive into each descriptor heap, and use it as a diagnostic tool to find any potential bugs in your application.
Note that if you click the hyperlink in the Resources column, it will bring up the Resource Viewer.
D3D12 Heaps View
The Heaps view provides a list of all heaps created by the application, along with detailed information about the resources contained in each heap.
 
When you select a heap from the left pane, you will see all one of two types of entries: Placed Resources or Tiles. Clicking the hyperlink in the Placed Resources box will take you to the Resources Graphical tab.
Tiles are used to populate sections of a tiled resource.
 
The right side of the Heaps view displays the memory data associated with the selected resource, which can also be seen on the Memory tab of the All Resources view.
Heap Map
The Heap Map shows a high-level layout of how the heap is currently being used. You can view the usage either by Type (for example, Buffer, Texture2D, etc.) or by the name of the Resource.
Type:
 
Resource:
 
The Heap Map shows any overlapping regions within the heap.
D3D12 Root Parameters
The Root Parameters view displays all of the root parameters bound for the current event. This allows you to quickly change the state of what you’re sampling from, constants, and other descriptors at a lightweight, faster rate than past APIs.
To access this view, go to Frame Debugger > Root Parameters.
 
The root signature displays the structure definition of what’s bound at that moment. Root parameters fill in that structure with the values you’re sampling from and the constants you’re using.
When you select a root parameter on the left, the root arguments for that parameter are displayed on the right. This shows residency information, any invalid descriptors are displayed in red. Using root parameters as a diagnostic tool can help prevent a GPU fault.
Note that if you click the hyperlink in the Resources column, it will bring up the Resource Viewer.
Vulkan Specific Views
Vulkan Descriptor Sets View
The Descriptor Set view displays all of the descriptor sets currently allocated and bound by the application at the current event.
To access this view, go to Frame Debugger > Descriptor Sets.
 
The left pane displays a selectable list of descriptor sets along with their layout, pool, consumption counts, and dynamics offsets.
When a set is selected, the right pane displays the resources currently associated with this descriptor set, as well as information related to the pool from which this descriptor set was allocated. In addition, clicking on a resource within the descriptor set displays more detailed information about that specific resource.
Note that if you click the hyperlink in the Preview column, it brings up the Resource Viewer associated with this image or buffer.
Vulkan Device Memory View
The Device Memory view provides a list of all device memory allocated by the application, along with detailed information about the resources contained in each memory region.
To access this view, go to Frame Debugger > Device Memory.
 
The left-most pane contains information about all device memory objects currently allocated. Once a device memory object is selected, the contained resources are listed in the middle pane, along with the resource layout map in the bottom left, and contained data on the right.
Vulkan Memory Pools
The Memory Pools view provides a list of all device memory allocated by the application, along with detailed information about the resources contained in each memory region.
 
The left-most pane contains information about all device memory objects currently allocated. Once a device memory object is selected, the contained resources are listed in the middle pane, along with the resource layout map in the bottom left, and contained data on the right.
Vulkan Texture and Sampler Pools
The Texture and Sampler Pools View provides a visualization of these different pool types. This can be useful for determining if a particular set of resources are in the resource pools they are expected to be in. The left-hand side allows you to select the pool you’re interested in, based on type. Included in the list are appropriate parameters about how the pool was created. On the right side is a list of the resource descriptors, some information about the resource itself, and a thumbnail preview. There is a link below the thumbnail that allows you to open that resource in the Resource Viewer for deeper inspection.
To access this view, go to Frame Debugger > Texture and Sampler Pools.
 
GPU Trace UI
GPU Trace profiles live applications. Once a trace is complete, the data is saved in a trace file and can be analyzed offline on any computer where NVIDIA Nsight Graphics is installed, without the need to have the specific GPU installed or the profiled application running.
The GPU Trace window is comprised of up to 8 main sections:
- Tabs to switch between the timeline view and the shader source view. 
 
Timeline Control Scheme
The GPU Trace timeline view provides a variety of controls to select and interact with presented data.
| Binding | Command | 
|---|---|
| 
 | Select Event | 
| 
 | Select Event and Zoom | 
| 
 | Multi-Select Event | 
| 
 | New Timespan Selection | 
| 
 | Add Timespan Selection | 
| 
 | Erase Timespan Selection | 
| 
 | Zoom to Selection | 
| 
 | Horizontal Panning | 
| 
 | Horizontal Panning | 
| 
 | Vertical Scroll | 
| 
 | Zoom In/Out | 
| 
 | Event Row Header Context Menu | 
| 
 | Event Context Menu | 
| 
 | Metric Row Header Context Menu | 
| 
 | Move Row | 
Trace Toolbar
At the top left of the timeline view, there are 6 buttons that extend the timeline’s capabilities:
 
- Ruler Relative: Controls the zero point of the ruler. This can be: - Trace: Zero is when the trace begins. 
- Viewport: In this mode, if you select a range and expand it, the beginning of the selected range is the zero point of the ruler. 
- Cursor: Zero is where the mouse is. 
 
- Trace Analysis: See Trace Analysis. 
- Trace Compare: See Trace Compare. 
- Queue Rows Hierarchy Toggle: In modern graphics APIs: actions, commands, and markers can be executed on different queues.
- GPU Trace traces these events according to the queue they were executed on, and shows it by default according to this hierarchy. For better granularity, it is possible to toggle this view from hierarchy to flat mode. The flat mode can be used to pin, remove, and rearrange individual queue subrows. 
 
- Overlays: Toggle different overlays, see Barrier Overlay and Subchannel Switch Overlay. 
- Miscellaneous Options: - Aggregate Frames: This option is supported only when the report contains multiple frames, and turning it on activates aggregate mode. In this mode: - The timeline shows only the first frame. 
- The metric values shown in the GPU Trace (metrics tab and timeline tooltip) are values averaged across all the frames (hovering over the values shows a tooltip displaying the original values used to compute the average). 
- Values that have significant variation between the frames are shown in gray. The threshold for determining this is available in the settings. 
 
 - This mode is useful when analyzing a report with multiple frames, to see averaged data and minimize the effect of frame variation. 
At the top right of the timeline view, there are buttons controlling the zoom level. These buttons may assist in navigating the timeline to the desired view.
- Start / End: Marks down the exact time for your start and end selection. 
- Reset Zoom: resets the timeline zoom for the entire trace. 
- Zoom to Selection: zooms to the selected range. Not available when the selection consists of multiple disjoint timespans. 
Timeline: Frames Data and Per-Queue Events
Frames Row
GPU Trace allows you to collect up to 15 consecutive frames in a single trace. The Frames row shows the frame execution boundary. Double-clicking on a frame automatically zooms in the timeline to the frame boundaries.
 
Context Row
Using multiple different APIs, such as D3D12 and CUDA, or multiple graphics queues, can generate multiple distinct contexts. The GR Engine (combining Graphics and Compute) can execute only one of these contexts at a time across the entire GPU.
 
In the GPU Trace timeline, the Context row shows the boundaries of all context ranges, indicating which context is active over time. And in cases where contexts from other processes are detected, those ranges will include the process name, if it can be determined.
 
Ideally, limiting the number of contexts if possible, and, consequently, the number of context switches can improve performance, as context switches incur some overhead.
Per-Queue Events
NVIDIA GPUs contain multiple independent engines that provide specialized functionality. These engines (e.g., graphics, compute, and copy) can execute work in parallel, and work can be submitted to them in separate queues.
In the GPU Trace timeline, you can observe actions and events that occurred throughout the frame execution, according to the queue they were submitted on. The per-queue part of the timeline presents events, user markers, and actions.
Queue Synchronization Objects
Since work can be submitted in separate queues, graphics APIs support synchronization of work between queues. GPU Trace capture unveils when Wait and Signal commands are being executed with relevance to the queue. Once such a synchronization object bar is selected, a line connecting to the relevant event is drawn. This makes it easy to understand when a wait event was triggered, when a signal event released it, and how much time a queue was in a ‘waiting’ state.
 
Resource Barriers
GPU Trace can capture resource barrier calls. These calls appear as additional events in the synchronization row, relevant to the queue they were triggered on.
 
Use the “Overlay Barriers” toggle button under the “Overlays” menu to see how the resource barrier event impacts the metrics graph data:
 
Subchannel switch overlay
When an application submits a sequence of different work types (e.g., Draw, then Dispatch) within a single queue, the hardware may insert an implicit barrier between them. This implicit barrier is called a Subchannel Switch; it involves a pipeline flush and wait-for-idle at the Front End, preventing parallelism across the barrier. To identify where these occurred on the timeline, under the “Overlays” menu, enable the “Subchannel Switches” checkbox. This feature is available on NVIDIA Ampere and Ada Lovelace Architecture GPUs. On NVIDIA Blackwell Architecture GPUs and newer, subchannel switches do not occur between 3D and compute workloads. The overlay is therefore unavailable for those architectures.
 
Use the “Overlay Subchannel Switches” toggle button under the “Overlays” menu to see how subchannel switches impacts the metrics graph data:
 
User Markers
GPU Trace also collects any User Markers that exist in the application, and displays them on the relevant queue it was executed on. This may help understand the frame workflow. GPU Trace supports API-specific markers as well as NVTX markers generated through the NVIDIA Tools Extension SDK.
 
Actions Row
The Actions row shows work submission actions, such as draws and dispatches, in correlation to the time it was executed and the queue it was executed on.
Each range in the actions row shows what the incremental time cost is for each succesive API call.
 
Compute Row
On NVIDIA Blackwell Architecture GPUs and newer, GPU Trace can collect various hardware events, including start and end timing information for compute workloads from D3D12, Vulkan, and CUDA APIs.
The Compute row uses this timing information to construct ranges that indicate when your compute workloads start and end, and which compute workloads are executing in parallel.
 
Enable Hardware Events:
Check the “Hardware Event System” checkbox in the activity window settings under “Additional GPU Settings” to enable Hardware Event collection.
 
Timeline: Metrics Graphs
The Metrics Data Rows can track NVIDIA GPU hardware units’ activity using performance monitors. GPU Trace enables collecting this data and observing in detail the hardware utilization during frame execution.
Note
In order to understand more what action items you can conclude from this data, the following blog is recommended:
The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload
https://devblogs.nvidia.com/the-peak-performance-analysis-method-for-optimizing-any-gpu-workload/
When hovering your mouse over the timeline, a tooltip appears that displays the average of the metrics data per the selected time. The data is sorted from high to low:
 
GPU Unit’s Metrics Data Rows
GPU Trace presents hardware units’ metric data collected throughout the frame execution. This data is presented in the timeline. Each counter data is presented in a specific row, while some counters are grouped for convenience. Hovering over the metric’s name, a tooltip is presented with the counter description. Group rows can be expanded to view individual counters.
The tooltip shows the counter data for the specific time where your mouse is pointed, or the average counter value for the selected range.
Note
Regions of the timeline that contain rows without any sampling data are clearly marked and crossed off to indicate the absence of samples. For example, this can occur in regions corresponding to another process’s context as a result of context switching.
 
Handling Rows in the Timeline
GPU Trace collects a lot of data. It is possible to arrange the timeline in a way that betters meet your current needs and allows you to focus on the area of your interest.
Hiding Rows
Focus your performance triage operation by hiding rows that are not the main concern by right-clicking the row of interest and then selecting the hide option.
 
Hiding a row removes it from the timeline, but does not delete the data from the database. You can add the row back to the timeline by clicking the green + square at the bottom of the timeline and then selecting the row you wish to add back.
 
Change Rows Location
You can change the Metrics Data Rows’ location by pressing Alt + Left Click and dragging the rows to the desired location.
Pinned Rows Option
The GPU Trace timeline allows you to pin rows. You can either click the pin icon on the right of the row or right-click a row and select to pin the row. The row will maintain it’s position within the view or be anchored to the top or bottom depending on whether you scroll up or down.
This information is saved so when reopening the report, the settings remain. In the below example, the Top-Level Throughput row is pinned, and this allows you to keep this row visible:
When hovering over the row, a pinned button pops up. If you click this row, it automatically moves to the top of the timeline and remains anchored when scrolling down the other rows. You can choose more than one row to pin.
 
User Ranges
User Ranges are ranges that can be added and edited on the GPU Trace report. This can be used as personal notes and enhance performance triage capabilities.
To add a user range:
- Select a range in the User Ranges row with “SHIFT + Mouse Right-Click.” 
- In the dialog that pops add you label and description. 
- Press OK. 
- Next to the file name, there is an asterisk (*), which indicates that this report has been edited. 
- You can edit or remove the range by using the right-click menu. 
A user range acts like any other range and its data is reflected accordingly in the Summary and Metrics tab.
 
Event List
The GPU Trace Event List view shows a subset (relevant to GPU performance) of API calls made by your application. It allows you to more easily identify a specific API call and find its duration on the timeline. The event list is searchable and filterable through text edits at the top of the view.
 
Event Details
The GPU Trace Event Details view shows API parameters of selected events in the Event List view. Parameters are listed in the same order as the selected event in a hierarchical structure. The event details are filterable through text edits at the top of the view.
 
Instruction Mix
The Instruction Mix view provides a breakdown of the instruction types, input dependencies and output stall locations in your shader(s). It is context-sensitive to the selection on the Timeline and Source views. See Shader Profiler Instruction Mix for more information.
 
Real-Time Shader Profiler Tabs
The real-time shader profiler views provide high-level insight into shader performance and can be used to jump to source code in the source view. It is context-sensitive to the selection on the Timeline view. See Shader Profiler Sections for more information.
 
Thread Divergence in Shader Profiler Views
An Active Threads Per Warp histogram is available in the Shader Profiler views within GPU Trace. Use this to identify shaders, functions, and source lines that paid a performance penalty due to branch divergence or a suboptimal number of threads launched. Values on the right of the histogram (closer to 32) indicate more efficient instruction execution.
 
Shader Profiling
The Shader Profiler is a tool for analyzing the performance of SM-limited workloads. It helps you, as a developer, identify the reasons that your shader is stalling and thus lowering performance. With the data that the shader profiler provides, you can investigate, at both a high- and low-level, how to get more performance out of your shaders. See the Shader Profiler section for more information.
Shader Source
When Collect Shader Pipelines is enabled the timeline shows a second tab called Shader Source. Switching to this tab hides the timeline and instead shows a source-level view of your shaders. If Real-Time Shader Profiler was also enabled, the shader source contains additional profiling information. See the Shader Profiler section for more information.
 
When Real-Time Shader Profiler is enabled, the views below the timeline and shader source view are useful to navigate between and find optimization opportunities within your shaders. See the Shader Profiler Summary Tab section for more information.
 
Information Tabs
The Information Tabs section provides general information on the capture, and also provides an additional view on the metrics data that were captured.
It contains 3 tabs:
Summary Tab
The upper section on the Summary tab provides details for the selected range. If no selection has been made, the information is relevant to the entire visible range:
- Start: The start time of the selected range or the visible range. 
- End: The end time of the selected range or the visible range. 
- Duration: The duration on the selected range or the visible range. 
- Range: An indicator whether the relevant data is applicable to a selected range on the visible range. 
 
Unit Throughput Summary Table
In this table, you can easily see the average value of the throughput units for the selected range. You can sort values from high to low.
Warp Occupancy Table
In this table, you can easily see the average value of the warp occupancy counters.
Metrics Tab
The Metrics tab encapsulates all metrics data and shows the average value for the selected range. You can easily filter and search for the desired counter using the text search bar. To do so, simply type the counter name (or part of the name), and the table is filtered automatically.
The metrics are divided according to the GPU Units / roll they represent.
Values in the metrics tab changes according to the corresponding selected range in the metrics graph area.
 
Capture Information Tab
The Capture Information tab provides general information of the capture, such the GPU model, CPU, and operating system that were used for the executable and comma line arguments run. This might be useful when trying to analyze workload behavior or reproduce issues.
 
Note that if there were any warnings or errors occurred while making this capture, they will appear in this tab.
Analysis View
GPU Trace provides a live analysis trace of the occupancy and throughput of the GPU’s various units. With the “Throughput Metrics” metric set, data for these metrics is collected in a single frame. This data provides to the user a good overview of how the GPU was occupied over time, with correlation to user markers, draw / dispatch commands, command lists’ execution, and synchronization objects. When using the “Multi-Pass Metrics,” data for more metrics is collected, but this time across multiple frames. In this mode, the trace contains not only throughput and occupancy data but also stall reasons, memory usage breakdown, and more. This mode is very beneficial to understand not only the limiting unit but also to understand the reason. However, in order to correctly understand the provided data and interpret it, in-depth understanding of NVIDIA hardware is required. The GPU Trace analysis tool aims to simplify the work for the end user by analyzing the provided data and automatically generating recommendations of where to look, potential areas that should be fixed and how. The analysis mode contains more than 40 common and advanced use cases and ‘lessons learned’ provided by our top devtech personnel. The goal of the tool is to provide actionable insights into the quality of every range in the trace. The tool will evolve over time and will always be up-to-date with more annotations that reflect our gained experience while doing performance triage.
Anlysis Ranks and Concept
The analysis tool contains many rules and formulas (limiters) that analyze the trace. The rules are grouped according to the relevant GPU Unit or role. Each group may contain one or more limiters. Each limiter consists of metrics data and has “Projected range gain” and “Projected frame gain.” The trace analysis ranks ranges according to the projected frame gain. The recommendation is to look at the markers’ leaves first, since this is easier to focus on a specific range. When looking at the bigger ‘parent’ markers, there is a need to take into consideration that the projected gain is bigger as the range itself is bigger, and may have a larger impact on the overall frame by nature.
Prerequisites
To get the most out of the analysis report, be sure to generate a trace using the “Multi-Pass Metrics.” You can enable “Multi-Pass Metrics” in 2 ways:
Check the “Multi-Pass Metrics” checkbox in the GPU Trace project settings:
 
If a GPU Trace session is already running, there is an option to enable “Multi-Pass Metrics” in the toolbar:
 
Note
Since the “Multi-Pass Metrics” collects data from multiple frames, you can help get more accurate results by following these recommendations:
- Make sure no other applications are running. 
- Make sure the option “Lock Clocks to Base” is checked. 
- It is recommended to run on a C++ capture or pause the application engine if you can. 
- It is also recommended to trace 3 frames so the “Aggregate” mode is applicable. 
Activating the Analysis View
After you make a trace using “Multi-Pass Metrics” on the desired frame, open the trace file. Once opened, select aggregate mode if you traced more than a single frame. This can also be changed in the analysis view. Click the “Analyze” button, and the analysis view is opened in a new tab. (Note that this feature is currently available only on Windows.)
Understanding the Analysis View
The analysis view is divided into 3 main sections (and a toolbar): Markers tree, Timeline, and the Analysis view.
The Toolbar Area
Aggregate Frames
This option is only enabled if more than one frame was traced. This improves accuracy since the metric values are calculated as an average and it reduces the influence of noise.
Skip GPU-Idle gaps for frame gain
When this is checked, the frame gain is calculated out of the frame duration without the GPU-Idle gaps. This is especially handy when profiling a CPP-Capture frame where there might be some artificial CPU work ranges that cause the GPU-Idle and can influence the projected gain calculation.
Show analysis colors only for leaf markers
Usually when doing performance triage, it is better to focus on the leaves first, since it is easier to fix code that performs a specific task. This is why the analysis tool performs analysis on the markers leaves first. In addition, since we rank markers according to the “projected frame gain,” the bigger markers may get higher rank by nature. Check “Show analysis colors only for leaf markers” if you want to perform full analysis, on all markers.
Icon legend for the different severity levels:
| Icon | Explanation | 
|   | Denotes ranges where the maximum projected frame gain is around 10% gain or higher | 
| Denotes ranges where the maximum projected frame gain is around N%, where N is the number appearing in the icon | |
| Denotes ranges where the maximum projected frame gain is less then 0.5% | |
| Denotes ranges where the maximum projected frame gain is 0, or not available | 
Note
The info annotations represent a collection of metrics with “normal” values. The annotation’s explanation and metrics values may still assist in understanding a potential issue.
The markers tree:
The markers tree shows all the performance markers according to their hierarchy. Each marker has the correlated icon, the duration, the % of the marker’s duration from the entire frame, and the frame gain.
Timeline:
This view shows the markers on the timeline. It helps to understand in a glance the way the frame is being executed over time. In this mode, the markers are color-coded by the potential frame gain. By default, we mark only leaf nodes as we recommend starting to triage the leaves first.
Analysis view:
This is the main content of the analysis. Limiters and formulas and divided into multiple categories. Each category represents a certain unit or a collection of units in the GPU. Each tab in the analysis view represents a category such as those shown in the below figure:
 
The categories are sorted according to the severity so the more severe is on the left-hand side. Hovering over the name of the category shows in an informative tooltip that explains the essence of this category.
Each category contains one or more annotations. Annotations are a set of metrics that has some corresponding logic. Click the annotation to see the relevant explanation, suggestion (if applicable), and the potential frame gain.
You can also view the metrics which are taken into consideration while calculating this annotation, the metric value, and a short description.
The Overview Annotations Category:
The overview category is special and is always the first to look at. It gives a good indication of the overall performance of the specific range and what is be the main unit with sub-optimal throughput. It also follows the Peak-Performance-Percentage (P3) analysis method presented and explained in the blog post here: https://developer.nvidia.com/blog/optimizing-vk-vkr-and-dx12-dxr-applications-using-nsig
Trace Compare
The Trace Compare tool enables the GPU Trace user to easily analyze the effect of his code changes on a specific frame. It displays a simplified version of the GPU Trace time line for 2 frames. The frames are placed one on top of the other, with their start time aligned. Trace compare enables to compare either 2 frames from 2 different GPU Trace reports or 2 frames within a single one.
Launch the Trace Compare Tool
Option 1: Project Explorer:
Select two capture files in the explorer tree, right click and choose trace compare:
 
Option 2: Click on the toolbar button.
Trace Compare Dialog
The Trace Compare dialog shows the selected files to compare. It also enables the user to choose the frame to compare from each capture in cases of multiple frames captures.
 
Using the Trace Compare Tool
Trace Compare displays the selected frames in a simplified version of the GPU Trace timeline, one on top of the other, aligning the frames’ start time.
 
Markers are correlated as well, so when you click on a certain marker on one frame, the matching marker on the other frame is chosen, if found.
Align to Marker
Sometimes it is easier to spot differences when the selected markers’ start times are aligned. Choose a specific marker and click the Align matching markers check box to activate automatic alignment of matching markers.
Align Selections
It is also possible to manually align any pair of selections. Select any marker, action, or other range in the top and bottom views. Then press the Align Selections button to align the views on the selections.
Metrics Table in Trace Compare Mode
The detailed Metrics Table appears in this mode and shows the metrics data for each frame, side by side, and the delta between the values.
Metrics information
The trace compare tool shows the metrics data for each trace and the ratio between those values.
Profiling Frameless Applications
GPU Trace is able to profile frameless workloads if different criteria are set for when the trace begins and ends.
Configuring GPU Trace:
To profile such applications, change the Start After and Limited To settings within the GPU Trace connection dialog.
- Change the Start After condition to one of the following options: - Manual Trigger: Specifies that the trace is manually triggered by the user through the host application or the Target application trigger hotkey on the running application. 
- Submit Count: The trace automatically starts after a select number of submits have been performed. Specifying 0 traces all submits. 
- Elapsed Time: The trace automatically starts after a select amount of time has elapsed. 
 
- Change the Limited To condition to one of the following options: - Max Submits: The trace is limited to a set number of submits in addition to the max duration. The trace starts on the the next submit once the Start After condition has been met. 
- None: The trace is only limited by the max duration. The trace starts on the the next submit once the Start After condition has been met. 
 
Depending on the choice of the Limited To option, it may also be necessary to update the Max Duration setting.
Set the rest of the settings as you normally would, providing the executable file and path, working directory, command line arguments and environment variables.
Collecting a Trace:
To collect a trace, all you need to do is press the “Launch GPU Trace” button. A trace is automatically collected when GPU Trace detects the supported API is in flight.
Open a GPU Trace report:
Once the trace has been collected, simply open the generated report and analyze it as you normally would.
Things to keep in mind:
Allocated Timestamps
GPU Trace is a detailed profiler and it collects a lot of metrics data, hence it is limited in the profiling session duration. The Allocated Timestamps setting influences the size of the buffer that the GPU Trace allocates to keep track of the GPU events. If you get an error message in the Output Messages window saying you ran out of resources, you might want to try and increase the number of Allocated Timestamps.
Application disconnected:
GPU Trace host launches the target application and profiles it. In this mode, the target application may exit automatically. Upon application end, you may get a warning message saying that the communication to the target was lost even though the trace was collected correctly.
Additional Capture Options
Nsight Graphics framework enables launching an application with a specific set of command line arguments and/or environment variables. This is done via the ‘Connect to Process’ dialog.
Below are special pre-defined environment variables:
Automatic capture after X number of frames
Set WARPVIZ_CAPTURE_ON_FRAME to trigger a capture automatically after X number of frames elapsed.
For example:
WARPVIZ_CAPTURE_ON_FRAME=100 will trigger capture automatically, once, after 100 frames.
Repeat automatic capture for every X number of elapsed frames
Set WARPVIZ_CAPTURE_FRAME_INTERVAL to automatically trigger a capture for every X frames elapsed.
For example:
WARPVIZ_CAPTURE_FRAME_INTERVAL=100 will trigger a capture every 100 frames.
Lock Clocks to Base
For better consistency between different captures, GPU Trace runs the target applications with ‘Lock Clocks to Base’. This means that the application will not run at maximum speed, but will be more consistent between runs. Turn it off if profiling at maximum speed is required.
Lock Clocks to Boost
GPU Trace offers the option to ‘Lock Clocks to Boost’ which attempts to lock to a higher frequency than base, but depending on thermal throttling, may still yeild a lower actual clock frequency.
 
Trace with Multi-Pass Metrics
GPU Trace trace hardware throughput data on a single frame. This data is collected according to the metrics set defined when launching the application. It is now possible to configure the application to collect ‘Multi-Pass Metrics.’ In this mode, the GPU Trace automatically collects many more counters on consecutive frames. At the end of the collection, you are able to view all this data presented as a single profiling session.
It is required that when using this mode, the traced application has user markers since GPU Trace matched frames according to the markers. It is also preferred that the markers execution order is consistent.
This mode provides additional counters that may explain “Why” there is low throughput.
Enable Multi-Pass Metrics:
Check the “Multi-Pass Metrics” checkbox in the project setting dialog:
 
Trace with Multi-Pass Metrics:
You can collect using the Target application trigger hotkey or the “Collect GPU Trace” button as in the regular mode. However, you might notice that the process takes a longer time. This is because in this mode much more data is being collected.
This mode relies on markers consistency across frames. If GPU Trace detects such inconsistency, the inconsistent markers are removed with a warning message:
 
Markers matching algorithm:
The current marker matching heuristics have the following goals:
- Generate valid timeline mapping for perfectly matching markers. 
- Deal with parameterized marker names (to some degree). 
- Put an emphasis on leaf marker matching (as those as typically used for performance analysis). 
- Mark markers which do not match as such, so they can be shown to the user as mismatched. 
The current implementation handles the marker hierarchy bottom-up (compared to the previous approach, which was top-down), and is done in two phases:
- Leaf markers matching 
- Parent markers matching 
In addition, if multiple frames are traced per pass, the user optionally (on by default) uses “best-frame matching” (see below) to select the best matching frames from each pass (see below).
Markers name comparison:
When comparing marker names, the current heuristics trim any trailing numbers/spaces from the end of the marker names, and then perform the string comparison. This successfully deals with the Unreal Engine frame marker (“Frame N”) but fails for more complex cases.
Leaf Markers Matching:
For each queue, the algorithm collects a list of all the leaf markers (markers without child markers) of that queue, and for each one.
The algorithm considers leaf markers as matched if:
- They have the same name. 
- They have the same number of parents. 
- All parent names are the same. 
Parent Markers Matching:
After the leaf marker matching, the algorithm tries to match parent markers from the bottom up.
Parent markers are considered as matched if:
- They have the same number of child markers. 
- All the child markers are matched. 
Best Frame Matching:
To handle target applications which have workloads alternating in multi-frame cycles (e.g., executing a specific workload once every 2 frames) and improve marker matching chances in general, the best-frame matching logic was introduced.
This works when tracing multiple frames per pass with Multi-Pass Metrics, and results in the user having a single frame to view, which represents frames that match the most.
The algorithm operates as follows:
- GPU Trace collects N>1 frames per pass with Multi-Pass Metrics. 
- When processing a pass, GPU Trace tries to match each frame from the first pass to each frame from the current pass and assigns a score to each frame-to-frame match, based on the total duration of matched markers (ending with NxN scores per pass). 
- The frame from the first pass that received the highest total score is selected, and used to build the timeline mapping by matching it with the frames from each pass which matched best against it. 
To Activate “Best Frame Matching” go to Tools -> Options -> GPU Trace -> Multi-Pass Metrics.
Notes to be considered:
Note
The multi-pass metrics mode automatically traces consecutive frames. It is recommended to freeze the game if possible, or not move the scene throughout the entire process.
View Multi-Pass Metric data:
Note
To work with Multi-Pass Metrics, the target application should use user markers.
The additional counters which are being collected with Multi-Pass Metrics are presented in the summary and metrics tabs and the markers table.
Summary Tab:
 
Metrics Tab:
 
Profiling Applications with Multiple Windows
It is possible to profile applications with multiple windows. When GPU Trace detects that there are multiple windows in the attached application, it automatically shows a drop-down menu where you can choose the windows you would like to profile. This enables profiling application from within editors:
 
Detect Interfering Processes While Profiling
GPU Trace collects a performance trace of the GPU during a period that corresponds to the target application activity (whole frames for typical graphics applications). The assumption is that during that time, the GPU performance data represents work done solely on behalf of the target application. However, as the GPU is a shared device, the trace can contain workloads done on behalf of other processes, and this can affect the trace data and subsequent performance triage.
Some steps can be taken to minimize the chance of other processes interfering with the trace:
- Run the target application on a dedicated test machine and do a remote trace 
- Close all other applications which might interfere (e.g., Outlook) 
- Run the application in full-screen mode 
However, even with those steps, sometimes other processes can execute GPU workloads unexpectedly, such as the Windows Desktop Window Manager (DWM), and there is value in detecting these workloads and indicating them to the user in the trace.
GPU Trace tries to detect if there was another process that used the GPU while profiling. If such a process was detected, this region is shown in the timeline.
You are able to see the process name to make sure to close it, or else you cannot rely on this range of metrics information for performance analysis.
Notes to be considered:
Note
This feature is currently limited to DirectX 12 on Windows and you must ensure that ‘Hardware-Accelerated GPU Scheduling’ is enabled for it to work properly.
Shader Profiler
The Shader Profiler is a tool for analyzing the performance of SM-limited workloads. It helps you, as a developer, identify the reasons that your shader is stalling and thus lowering performance. With the data that the shader profiler provides, you can investigate, at both a high- and low-level, how to get more performance out of your shaders.
To access this view, go to Frame Debugger > Shader Profiler.
You can alternatively perform shader profiling through targeted actions through several controls within the UI:
- From the Linked Programs View, when right-clicking a shader. 
- From the API Inspector, when navigating shader pipeline state 
- From the Event List View, when right-clicking an event or range 
Note
Shader profiler reports can be exported and saved for future reference or sharing with colleagues. To save the report, click the save icon in the upper-left-hand side of the report.
Sections
The Shader profiler has the following tabbed sections:
- Summary: this section shows a top-level summary of information about the profiling run. It is the place from which you can understand how your shaders within the profiled range performed. 
- Source: this section reports a per-line breakdown of the performance of your shaders. It includes several visualization modes, including high-level source reports as well as side-by-side high-level source to lower-level correlation. 
- Session Info: reports information on the profiling session including the events that were profiled, how many passes were required, information about the application that was sampled, and what convergence was seen of the profiling session. 
Summary Tab
The summary tab shows a top-level summary of information about the profiling run. It contains the following sections:
- Shaders: this section provides a hierarchical breakdown of the shaders used by the range that was profiled. It is the primary jumping-off point for understanding how your shaders performed. 
- Top-Down Calls: this section provides a hierarchical breakdown of the high-level functions (HLSL / GLSL) and their called sub-functions. It is context-sensitive to the selection in the Shaders. 
- Hot Spots: this section reports the source-lines for which the Top Stalls are discovered. It is context-sensitive to the selection in the Shaders. 
- Ray Tracing Live State: this section reports the ray tracing live state information for ray tracing applications. It is context-sensitive to the selection in the Shaders. 
- Events: this section reports the events for which the Top Stalls are discovered, sorted by GPU time. It is context-sensitive to the selection in the Shaders. 
- Instruction Mix: this section provides a breakdown of the instruction types, input dependencies and output stall locations in your shader(s). It is context-sensitive to the selection in the Shaders and Source Tab views. 
Shaders
This section provides a hierarchical breakdown of the shaders used by the range that was profiled. It is the primary jumping-off point for understanding how your shaders performed.
 
This section presents a table of all of the shaders within a range alongside how each performed. At the root, there is a ‘Session’ element that represents all samples for the entire range. Below it, the shaders are grouped depending on the Group By setting of the view. Grouping by Shader flattens the tree; grouping by Pipeline hierarchically lays out all samples according to the pipeline to which they belong.
The left-hand side of the information panel presents information by which you can classify each shader.
- Type: indicates the kind of hierarchy for elements below. 
- Name: the name of the shader / pipeline for which results were collected. 
- Hash: displays a unique hash of application level shader information (e.g., bytecode). 
- # Warp: indicates the maximum theoretical warp occupancy. At the shader instance level, this is the maximum number of warps that can fit concurrently in an SM, if this shader was run in isolation. At the pipeline level, it shows the range of values across all shader stages’ shader instances. Hover over this cell for a tooltip that reports which resource(s) limited the theoretical warp occupancy. When dynamic conditions (such as the # of primitives) affect the calculation, a range of values is displayed for the two extreme cases. 
- # Reg: indicates the registers per thread needed by a shader. 
- Smem: indicates the bytes of attributes needed by a 3D shader per warp, or the shared memory allocated for a compute shader per thread group. 
- CTA Dim: indicates the thread group size used by a shader (i.e., HLSL numthreads, GLSL local_size, Hull / Tessellation Control patch_size). 
- Correlation: reports the success or failure of reading debug and correlation information for the shader that was profiled. Hover over this cell for a tooltip that reports detailed information about the operation, including the files in which debug information was read. 
- File Name: the file from which this shader was generated. 
The center-right side presents the performance of the row in question, including the top stalls for each row and all stall reasons.
See Stall Reasons for a full listing and description of stall reasons.
When sampling, some samples may be reported as Unattributed. Samples in this row either failed correlation, or represent internal operation of the GPU that this sampler does not report. These samples can generally be ignored, except in the case where the results are of a high quantity, in which case we recommend you save this report to communicate this issue to the Nsight Graphics team.
- Samples: reports the total summation of samples collected for this row. 
- Top Stall #1: reports the stall reason with the highest incidence for the row in question. 
- Top Stall #2: reports the stall reason with the 2nd highest incidence for the row in question. 
- Top Stall #3: reports the stall reason with the 3rd highest incidence for the row in question. 
The rightmost side presents the execution counters of the row in question, including instruction executed counts, thread-instruction executed counts and thread divergence.
- Instructions Executed: reports number of instructions executed. 
- Thread Instructions Executed: reports sum of instructions executed of all threads (this is up to 32 * “Instruction Executed” as each warp has 32 threads). 
- Thread Instructions Executed Pred On: same as “Thread Instructions Executed” but only count the predicated-on threads (the predicated-on threads execute the current instruction while the predicated-off threads skip it). 
- Active Threads Per Warp: reports thread divergence that the average predicated-on threads are executed per instruction. - Active Threads Per Warp = Thread Instructions Executed Pred On / (Instructions Executed * 32) 
 
Note
By default some columns are hidden. The visibility of columns can be toggled by right-clicking on the table’s header.
Flame Graph
This section provides a visualization of software execution. This is based on function calls while the root functions are displayed as aligned color bars at the bottom and the called functions are recursively stacked above them. It is only available when the input shader bytecode has debug information (e.g., adding “/Zi” option for dxc compiling DXIL bytecode).
 
- Every color bar represents a function. The width of the bar is proportional to the number of samples / execution counters. 
- The y-axis shows the function call depth, ordered from root at the bottom to leaf at the top. 
- The x-axis shows the size of samples / execution counters, across different shaders. 
- Interact with Source, Top-Down Calls and Bottom-Up Calls by clicking a color bar or the Go To… actions in the context menu. 
- Focus on one function and show callers and callees by clicking action Show Butterfly View in the context menu. 
- Focus on one function and aggregate samples from every instance into a single root node, by clicking Aggregate Across all Calls in the context menu. 
Top-Down Calls
This section provides a hierarchical breakdown of the high-level (HLSL / GLSL) functions and their called sub-functions. It is only available when the input shader bytecode has debug information (e.g., adding “/Zi” option for dxc compiling DXIL bytecode). Samples / execution counters are aggregated to the function level. Links can be clicked to navigate to the source line of function definition or where a function is called within the Source section.
 
- Function: the name of the function. Expand to see other functions that called by it. 
- Call Location: the file and line that this function is called. 
Bottom-Up Calls
Similar to Top-Down Calls, this section provides a hierarchical breakdown of the high-level (HLSL / GLSL) functions and their caller functions. This is only available when the input shader bytecode has debug information (e.g., adding “/Zi” option for dxc compiling DXIL bytecode). Samples / execution counters are aggregated to the function level. Links can be clicked to navigate to the source line of function definition or where a function is called within the Source section.
 
- Function: the name of the function. Expand to see in what functions it is called, and to find out the weights of samples / execution counters from different caller functions. 
- Call Location: the file and line that this function is called. 
Hot Spots
This section reports the source-lines for which the Top Stalls are discovered. Lines within this table can be double-clicked to navigate to the source line in question within the Source section. The display can also be toggled between High-Level and Intermediate/Lower level by changing the Type selection.
 
- Shader: the shader to which this line belongs. 
- Source Location: the file and line of this hot spot. Expand to see the call path. 
- Function: the name of the function to which this hot spot belongs. 
- Source: the source for the particular hot spot. 
Ray Tracing Live State
This section reports the ray tracing live state information for ray tracing applications. Source/IL location links within this table can be clicked to navigate to the source/IL line in question within the Source section.
Variables initialized before an (HLSL) TraceRay or (GLSL) traceRayEXT call, and used after it, are Live State that need to be maintained across the call while invoking hit and miss shaders. For improved performance, we recommend trying to minimize the amount of live state.
 
- Name: shows the callsite / calling context / live value names (identifiers). 
- Source Location: reports the source location from which this live state information comes. 
- Live State Bytes: reports the size of this live state information. 
- Live State Values: reports the number of live values reloaded / defined at the particular location. 
- Samples: reports the cost of the trace ray call in question, including live state loads, but it does not include the latency penalty on the consumer of the live state loads. 
- Source Preview: shows the source preview for the particular live state inforamtion. 
- IL Preview: shows the IL preview for the particular live state information. 
Events
This section reports the events for which the Top Stalls are discovered, sorted by GPU time. It is context-sensitive to the selection in the Shaders. Within each event row is a link to an event that relates to the selection. This allows you to make associations between selected shaders and the resources, state, and commands that utilized the shader in question.
 
Instruction Mix
The Instruction Mix view provides a breakdown of the instruction types, input dependencies and output stall locations in your shader(s). It is context-sensitive to the selection in the Shaders and Source views.
The Instruction Mix view is useful for navigating through instruction level dependencies. Variable latency instructions — such as memory accesses, transcendental math functions, and warp level primitives — are called scoreboard producers. Instructions that use the results of scoreboard producers, or that reuse the registers from scoreboard producers, are called scoreboard consumers. Stall reasons such as “long scoreboard” appear on the scoreboard consumer instructions, but are actually caused by the corresponding scoreboard producers instructions.
The section contains three tables:
- Self Instruction Mix: this table provides a breakdown of instruction types and number of samples attributed to each instruction type within the current selection. Each sub-item contains a hyperlink that can be clicked to jump to that location in the Source View. 
- Input Dependencies: this table provides a list of scoreboard producer instruction categories that the current selection is dependent upon. Each sub-item contains a hyperlink that can be clicked to jump to the location of the scoreboard producer in the Source View. 
- Output Stall Locations: this table provides a list of scoreboard producer instruction categories within the current selection. Each sub-item contains a hyperlink that can be clicked to jump to the consumer of the given scoreboard in the Source View. 
Each table has the same columns available:
- Pipe: hardware pipe used to process the instruction. 
- Family: type of work that the instruction performs. 
- Operation: type of operation of the given work type that the instruction performs. 
- Samples: number of samples attributed to the instruction category or location. 
- Instructions: number of instructions or instruction mix corresponding to the given category or location. 
The view provides controls to adjust how data is displayed:
- Filter by Summary Selection: if checked, results reflect the current selected item(s) on the Shaders view; otherwise the selection is ignored and results for all shaders are shown. This does not impact data shown while in the Source view. 
- Auto-Expand Items: if checked, all top-level cells automatically expand to reveal sub-items with hyperlinks. 
- Show All Source Locations: if checked, all source locations are listed as sub-items under each top-level cell; otherwise only the top 3 locations by number of samples are shown. 
 
Source Tab
This section reports a per-line breakdown of the performance of your shaders. It includes several visualization modes, including high-level source reports, side-by-side high-level source to lower-level correlation, and interleaved high-level source within lower-level source through a mixed mode view.
There are a few top-level controls that control which shader is viewed and how that shader is viewed.
 
Shader: select the shader that you would like to view.
Languages: change the source display to show high-level language, a lower-level language, or alternatively, high-level language alongside a lower-level language.
Source: because shaders can be compiled from a main file and several includes, this selector allows you to select which particular source file you wish to investigate.
Interleaving Mode: controls how higher-lever source is interleaved within the view (only available for lower-level source views).
 
Once a shader and view is selected, you can use one of several navigation tools to navigate within the shader source.
 
- Find: enter a text string to find. This is useful for finding variables, methods, or register names. 
 : navigate to the row with the highest value in the corresponding stall reason column. : navigate to the row with the highest value in the corresponding stall reason column.
 : navigate to the row with the next higher value in the corresponding stall reason column. : navigate to the row with the next higher value in the corresponding stall reason column.
 : navigate to the row with the next lower value in the corresponding stall reason column. : navigate to the row with the next lower value in the corresponding stall reason column.
 : navigate to the row with the lowest value in the corresponding stall reason column. : navigate to the row with the lowest value in the corresponding stall reason column.
For many uses cases, you likely want to start by navigating to the highest value, and from there navigate progressively high values until you have planned your next action.
In addition to using the navigation buttons above, you may use the scrollbar with embedded heatmap to identify areas within the source file that are of interest given a high sample count.
To copy lines, select the lines in question and use the system shortcut to copy to the clipboard. Multiple successive lines can be selected with Shift+click; individual lines can be selected with Ctrl+click.
 
Source Navigation
- Go Backward / Forward: navigate to the previous or next positions in the source file. 
 
- Go To Definition: navigate to the definition of a function from where the function is called. The function call is displayed as a hyperlink. 
 
- Go To Top-Down / Bottom-Up Calls: right-click the hyperlink and you are able to go to the corresponding item in Top-Down Calls or Bottom-Up Calls. 
 
Session Info
This section reports information on the profiling session including the events that were profiled, how many passes were required, and what convergence was seen of the profiling session.
Collection Statistics indicate information about the collection proceeded and how the collection converged.
- Pass Count: the number of passes for which samples were collected. 
- Total Samples: The total number of samples collected in this session. 
- Error (min): Standard error of the mean of % samples for a given stall type. This is used to gain confidence that enough passes have been performed. This value is the minimum error seen. 
- Error (max): Like Error (min), yet this reports the maximum error seen. 
- Error (average): Like Error (min), yet this reports the average error seen. 
The Configuration section reports key information about the session so that you can know from which application and configuration the session was collected. This section can be important to reference if you collect many reports from different ranges and application configurations.
The Events section lists all of the API events that were sampled within this range.
Shader Profiling for SM Limited Workloads
The Shader Profiler is a tool for analyzing the performance of SM-limited workloads. It helps you, as a developer, identify the reasons that your shader is stalling and thus lowering performance. With the data that the Shader Profiler provides, you can investigate, at both a high- and low-level, how to get more performance out of your shaders. The Shader Profiler currently supports D3D12 and Vulkan APIs.
 
How do I use it?
The Shader Profiler can be launched from several locations within the tool. Because the Shader Profiler targets the performance of your shaders, we recommend that you launch the Shader Profiler when GPU Trace has identified that a particular range is shader limited. Once identified, you can open the Shader Profiler via Frame Debugger > Shader Profiler to collect and present a Shader Profiler report.
How does it work?
The Shader Profiler works by repeatedly running your shader code in a replay and using dedicated hardware samplers to determine the reasons why your code is stalling. The repeated runs allow for capturing of statistically valid sampling that ensures that you are getting a reliable, actionable analysis. Once the sampling experiment is completed, a report is generated that allows you to find and action on the key hot spots within your shader pipeline.
The reports contains several sections, including a rollup of all of the shaders that were active within the range that was sampled, a high-level, selection-sensitive Sample Summary of the samples within that range, and a Hot spots view that identifies the key lines that contributed the most samples in the overall range. The report also presents tabs that report on session and application information, as well as the Source tab that allows for mapping, on a line-by-line basis, where samples hit.
Key Concepts
The shader profiler should be used to optimize latency-bound shaders. These types of shaders often have signatures of these forms:
- SM Activity and SM Occupancy are high. (If not, improve these first.) 
- SM Throughput is low. 
- Cache Throughputs (L1TEX, L1.5, L2) are low or middling. 
If SM Throughput is high, the shader is likely computationally-bound, and better solved through a GPU Trace workflow.
Average Warp Latency
The average warp latency is the number of cycles that an average warp was resident on the GPU. The Samples% indicates the percent of the average warp latency occupied by a given shader, function, or PC. Sorting by Samples reveals the regions of code with the highest contribution to latency. After identify top latency contributors, determine next steps by inspecting stall reasons.
Interpreting Sample Locations
Stalls are reported at the PC where a warp was unable to make progress. In many cases, this is due to an execution or data dependency on a prior instruction. For example, the following code may report a large number of samples on the line of code that consumes texResult, but the real culprit is the data producer g_MeshTexture.Sample().
float4 texResult = g_MeshTexture.Sample(MeshTextureSampler, In.TextureUV);
Output.RGBColor = texResult * In.Diffuse;
Note that samples can appear in the shadow of a taken branch — that is, on the instruction following a branch, even if that instruction is not executed — because the branch is still resolving at the time of the sampling.
if (constantBuffer.ConditionWeExpectToBeFalse)
{
     texResult = ...; // samples in the shadow of a branch
     output = dot(color, textResult);
}
else
{
     output = dot(color, constant);    // expect all samples to fall here
}
Stall Reasons
Stall reasons explain why a warp was unable to issue an instruction. Each stall reason is provoked by a distinct set of conditions or instructions; by eliminating those conditions or transforming code from one set of instructions to another, you can reduce stalls.
- Barrier: Compute warps are waiting for sibling warps at a GroupSync. - If the thread group size is 512 threads or greater, consider splitting it into smaller groups. This can increase eligible warps without affecting occupancy, unless shared memory becomes a new occupancy limiter. 
- Review whether all GroupSyncs are really necessary. 
 
- Dispatch Stall: A pipeline interlock prevented instruction dispatch for a selected warp. - If dispatch stalls are higher than 5%, please file a bug with NVIDIA. 
 
- Drain : Exited warp is waiting to drain memory writes and pixel export. 
- LG Throttle: Input FIFO to the LSU pipe for local and global memory instructions is full. - Avoid using thread-local memory. - Are dynamically indexed arrays declared in local scope? 
- Does the shader have excess register pressure causing spills? 
 
- Eliminate redundant global memory accesses (UAV accesses). 
- Data organization: pack UAV or SRV data to allow 64-bit or 128-bit accesses in place of multiple 32-bit accesses. 
 
- Long Scoreboard: Waiting on data dependency for local, global, texture, or surface load. - Find the instruction or line of code that produces the data being waited upon; that instruction is the culprit. 
- Consider transforming a lookup table into a calculation. 
- Consider transforming global reads in which all threads read the same address into constant buffer reads. 
- If L1 hit rate is low, try to improve spatial locality (coalesced accesses). 
- If VRAM Throughput is high, try to improve spatial locality (coalesced accesses). 
 
- Math Pipe Throttle: A math pipe input FIFO is full (FMA, ALU, FP16+Tensor). - This stall reason implies being computationally bound. Use GPU Trace to best determine how to move computation to a different execution unit. 
 
- Membar : Waiting for a memory barrier to return. - Memory barriers are issued by GroupMemoryBarrier, DeviceMemoryBarrier, AllMemoryBarrier, and their GroupSync variants. 
- Review whether the specified scope of each barrier in the shader is really needed. Group-level barriers resolve much faster than Device-level. 
- Review whether a memory barrier is needed at all. A compute shader where each thread writes to a unique UAV location does not require a memory barrier. 
 
- MIO Throttle: The input FIFO to MIO is full. - May be triggered by local, global, shared, attribute, IPA, indexed constant loads (LDC), and decoupled math. 
 
- Misc: A stall reason not covered elsewhere. 
- Not Selected: Warp was eligible but not selected, because another warp was. - High “not selected” could indicate an opportunity to increase register or shared memory usage (lowering occupancy) without impacting performance. This opens the doors to greater shader complexity or improved quality. 
 
- Selected: Warp issued an instruction. Technically not a stall. 
- Short Scoreboard: Waiting for short latency MIO or RTCORE data dependency. 
- TEX Throttle: The TEXIN input FIFO is full. - Try issuing fewer texture fetches, surface loads, surface stores, or decoupled math operations. 
- Check whether the shader is using decoupled math (usually to be avoided). 
- Consider converting texture lookups or surface loads into global memory lookups (UAVs). Texture can accept 4 threads’ requests per cycle, whereas global accepts 32 threads. 
 
- Wait: Waiting for coupled math data dependency (FMA, ALU, FP16+Tensor). 
 
Hot Spots
Hot spots identify the top locations that have the most hit samples. This listing presents an actionable way of identifying and jumping to high-impact areas of the given report.
 
Ray Tracing Live State
Ray Tracing Live State lists ray tracing live state information for ray tracing applications and presents an actionable way of identifying and jumping to high-impact areas of the given report.
Variables initialized before an (HLSL) TraceRay or (GLSL) traceRayEXT call, and used after it, are Live State that need to be maintained across the call while invoking hit and miss shaders. For improved performance, we recommend trying to minimize the amount of live state.
 
Source Correlation
The Shader Profiler has the ability to correlate the samples that are gathered to source-level lines. This allows you, as the user, to determine, on a line-by-line basis, how your code is running. There are two types of correlation that are supported: high-level shader language correlation and GPU shader assembly microcode correlation. High-level shader language correlation prepares a listing of your shaders source code, and alongside it, a chart of the samples that landed on each particular line. High-level correlation is effective at grounding you to the code you are most familiar with, which is the shader source itself. For users who have access to the Pro builds of Nsight Graphics, and who wish to dive into the lower-level shader assembly, a disassembly view is provided for individual instruction association of samples. For non-Pro users, the disassembly view still shows instruction-range correlation. Low-level source correlation views allow interleaving of higher-level source code. This can be configured through the Interleaving Mode dropdown menu.
 
Shader Debugger
Note
Shader debugging is currently only supported for the Vulkan API.
Viewing Shader Source
The Shader Debugger opens the shader source file(s) when you select the blue link from the Shaders View.
 
For the Shader Debugger to have access to your shaders, you need to compile them using the -g flag which ensures the original source code is included in the SPIR-V file. Some applications use include files in their shaders to aid in maintainability and code reuse. The Shader Debugger supports this by allowing the user to specify which file they want to display when debugging the shaders.
 
The Source Window has a combo box at the top that contains the names of the various files in the shader. Once selected, the contents of the window will switch to that source code.
Adding Breakpoints
You can add a breakpoint from the Shaders view and the Source Window. In the Shaders view, you can right-click on the shader of interest and select Set Breakpoint. This puts a breakpoint on the first instruction of the shader, and when it is hit the source window automatically opens.
 
Alternatively, you can also put a breakpoint directly in the source window. Simply move the cursor to the source line you want a breakpoint on and hit F9, or click in the area just to the right of the line number to create/delete any breakpoints.
 
Conditional Breakpoints
Once you create an unconditional breakpoint, you can narrow the focus on the breakpoint by adding a conditional expression. The conditions can include simple scalar expressions, similar to the Watch View, including in-scope variables, and intrinsics like pixel location from the table above. For example, to construct a conditional for all fragments that are in region from 100-150 in x and y, you would use:
- @pixel.x >= 100 && @pixel.x < 151 && @pixel.y >= 100 && @pixel.y < 151
Disabling or Removing Breakpoints
Breakpoints can be removed from either the Breakpoints Window or the Source Window.
- Breakpoints Window - Hitting <Del> deletes the breakpoint in the row that is currently highlighted. 
- Right-clicking a row and selecting Enable toggles the current enabled state of the breakpoint, indicated by the check mark. 
- Right-clicking a row and selecting Delete deletes the corresponding breakpoint. 
   
- Source Window - Left-clicking a breakpoint marker or hitting F9 when the cursor is on a line with a breakpoint deletes the corresponding breakpoint. 
- Right-clicking a breakpoint marker opens a context menu that allows to delete or to disable the breakpoint. 
 
Using the Focus Picker
The Focus Picker View shows information about the GPU’s current state and the current workloads that are active when the GPU is stopped, for example, when a breakpoint was hit. This is similar to the “Threads” view of a CPU debugger. It provides a summary page that summarizes the current GPU state and how many GPU workloads for each of the shader types are currently in flight.
The possible values for GPU Status are:
- Running — The GPU is running, actively executing graphics workloads. 
- Paused — The GPU is paused and the per-shader type summary pages are active and show the current in-flight GPU workloads. 
- Stalled — The GPU is paused, the last focus warp might have exited during stepping or it could be waiting on a barrier due to other warps in the same group being frozen (see Stepping Modes). - Hit Pause to move back to Paused mode and to select a new focus thread to inspect or step another warp. 
- Or, hit Resume to resume the GPU to hit another breakpoint. 
 
 
Each of the per-shader type focus picker pages shows the list of in-flight workloads for that particular pipeline stage.
Most importantly, the Focus Picker View allows you to change the currently focused workload — i.e., the ray, vertex, patch, primitive, fragment, or compute thread for which data is shown in other views — like Registers, Locals, Watch, etc. Furthermore, the current focus choice determines which workload shader source is stepped. The current focus workload is indicated with a yellow arrow marking the corresponding row in one of the per-shader tables. It can be changed by double-clicking any other table row. Clicking the Show Focus button on any of the focus picker pages opens the page and scrolls to the row that has the current focus.
 
In addition to the table views available for all shader stages the Focus Picker View provides two graphical views for the vertex shader stage Vertices (3D) and the fragment shader stage Fragments (2D).
The Vertices (3D) page shows point cloud representations of the (pre-transform) vertex data for the draw calls currently active in the vertex shader stage.
 
The currently active vertex shader workload is highlighted through colored points:
- A red point with a blue halo corresponds to a vertex shader thread that has the current debug focus. 
- Yellow points correspond to vertex shader threads stopped at a breakpoint. 
- Blue points correspond to any other active vertex shader thread in flight. 
The 3D view can be manipulated using mouse and keyboard:
- Holding the left mouse button while moving the mouse pointer rotates the object. 
- Holding the right mouse button while moving the mouse pointer moves/translates the object. 
- Holding the middle mouse button while moving the mouse pointer or using the mouse wheel zooms the view. 
Holding the Ctrl-key down during any of the above operations enables a slow interaction mode for enhanced control. Similar to the Vertices table view, the Vertices (3D) view also allows changing the focus thread by clicking on any of the highlighted points. Since the view supports only showing one 3D object at a time, there are combo boxes that allow you to select which draw call’s data to show (if multiple draw calls are in flight), and which of the vertex attributes to use for the 3D positions of the points. The Fragments (2D) page shows an image of the current state of the destination render targets for the fragment shader workloads currently in flight. Note: this is not the final render target image as in the Frame Debugger, but actual frame buffer content.
 
The currently active fragments are shown as a color overlay:
- A red pixel corresponds to the fragment shader thread that has the current debug focus. 
- Yellow pixels correspond to fragment shader threads stopped at a breakpoint. 
- Blue pixels correspond to any other active fragment shader thread in flight. 
It is possible to zoom in and out using the mouse wheel or to change the opacity of the overlay to better see the render target image in the background. Similar to the Fragments table view, the Fragments (2D) view also allows changing the focus thread by clicking on any of the highlighted pixels. You can select which render target to view using the combo boxes below the image view.
Using the Warp Info View
The Warp Info view provides the ability to see an “SM-centric” view of the warps and threads in flight. Each row represents a single warp running on an SM, and the colored boxes in the middle are the threads within the warp. A red box indicates a thread has hit the breakpoint. A dark red box indicates a thread that has hit an exception. Green boxes indicate other active threads. Light green boxes indicate threads waiting at a barrier. Light gray boxes indicate threads that are inactive due to control-flow divergence. And last, dark gray boxes indicate threads that are unused or have terminated. The other columns show data about the status of the warp, etc. Hovering the mouse pointer over a row shows a tooltip window with additional detail about the warp. Similarly, when hovering the mouse over a thread box a tooltip window shows details about this specific thread.
The Warp Info View also allows changing the current debug focus. Double-clicking a row changes the debug focus to that warp. Double-clicking a thread box changes the debug focus to that particular thread.
 
Stepping, Breaking, and Resuming Execution
In addition to breakpoints, the Shader Debugger allows stepping through the shader code and other means of controlling shader execution, such as resuming all active GPU warps or breaking into the execution. The supported commands are:
- Pause 
- Resume (F5) 
- Step In (F11) 
- Step Over (F10) 
- Step Out (Shift-F11) 
Due to hardware organization, stepping is always on a warp level. That means instead of individual GPU threads or workloads, a group of threads (up to 32) is always stepped.
The Shader Debugger UI does not support explicit per-thread or per-warp control over which warps are frozen during a step operation. Instead the debugger supports a number of stepping modes that control how non-focused warps behave during stepping. The supported modes are:
| Mode | Resume | Step In, Step Out, Step Over | 
|---|---|---|
| Resume All | Nothing is frozen | Nothing is frozen | 
| Resume Group (Same as Resume Warp, if not a compute shader) | All warps outside of the current (compute) group are frozen | All warps outside of the current (compute) group are frozen | 
| Resume Warp | All warps except the current are frozen | All warps except the current are frozen | 
| Step Group (Same as Step Warp, if not a compute shader) | Nothing is frozen | All warps outside of the current (compute) group are frozen | 
| Step Warp (default) | Nothing is frozen | All warps except the current are frozen | 
The currently active mode can be selected in the Scheduling drop-down in the shader debugging toolbar.
Using the Locals and Watch Window
The Locals window lists the local variables defined in the current scope and their current values.
 
The Watch window allows evaluation of shader variables defined in the current scope and its parent scopes. That means, in addition to local variables, it is also possible to inspect global variables, like uniforms and varyings. Furthermore, it is possible to use expressions that use scalar operators, like arithmetic operators, type casts, or struct component and array element access operators. In addition to local and global variables defined in the debugged shader, a number of intrinsic variables can be used in the Watch window or in breakpoint conditions. The intrinsic variables supported differ by shader type:
| Shader Type | Supported Intrinsic Variables | 
|---|---|
| Vertex | @vertexId, @instanceId | 
| Tessellation Control | @controlPointId, @primitiveId | 
| Tessellation Evaluation | @domainLocation, @primitiveId | 
| Geometry | @instanceId, @primitiveId | 
| Fragment | @pixel, @sampleId, @primitiveId, @rtArrayId | 
| Compute | @groupThreadId, @groupId | 
 
Exiting Debug Mode
To exit shader debugging, remove any breakpoints you have set and resume the GPU. Then you can terminate your application as you normally would, or use the Terminate button on the toolbar.
Generate C++ Capture UI
Compiling and launching C++ captures
 
The additional features of an ngfx-cppcap file include:
- Screenshot of the capture taken from the original application 
- Information about the captured application and its original system 
- Statistics about the captured API stream 
- Utilities to build the C++ capture or open the associated CMake project 
- Utilities to launch the compiled application: - The Execute button launches the compiled executable. 
- The Connect… button populates a new connection dialog that allows you to run a specific activity on the generated capture. 
 
- User comments that are persisted within this file. 
GPU Crash Dumps
GPU Crash Dump Monitor
GPU Crash Dump Monitor Settings
To configure the NVIDIA Nsight Aftermath Monitor settings, left-click the NVIDIA Nsight Aftermath Monitor icon in the Microsoft Windows system notification area (system tray) or right-click the icon and select the Settings option from the pop-up menu.
General Settings
The General Settings page allows to configure the directory where GPU crash dumps are stored, the directory where NVIDIA shader debug information files are stored, and whether the NVIDIA Nsight Aftermath Monitor should prompt to open new crash dumps in Nsight Graphics.
 
Aftermath Settings
The Aftermath Settings page allows you to configure various options that control Nsight Aftermath graphics driver features and allows you to select for which applications GPU crash dumps are captured.
Note
On Windows, modifying Aftermath graphics driver settings requires Windows Administrator privileges. Therefore, when any of these settings are modified and applied, a User Account Control confirmation window may pop-up asking for permission to modify system settings.
 
Supported Aftermath Modes are the following:
- Disabled disables all GPU crash dump creation. 
- Global enables crash dump creation for all applications using the D3D11/D3D12 or Vulkan APIs. 
- Whitelist allows you to limit the GPU crash dump creation to a specific set of applications on the whitelist. 
Generate Shader Debug Information enables the generation of shader debug information (line tables for mapping from the shader IL passed to the NVIDIA graphics driver to the shader microcode executed by the GPU) for all shaders loaded by the applications for which Aftermath crash dump creation is enabled.
The GPU Crash Dump Monitor stores the debug information into files with the .nvdbg extension in the Debug Info Dump Directory configured in the General Settings Tab.
The shader debug information is required for mapping shader microcode instructions of active or faulted shader warps to shader IL or shader source lines. Shader debug information is identified by a unique shader debug infromation identifier embedded into the crash dump file.
See also the section about Source Shader Debug Information for details on how to compile shader source with source-level debug information.
Note
Enabling this setting causes additional compilation overhead for generating the debug information and general driver overhead for handling the debug information during shader compilation.
Enable Resource Tracking enables driver side tracking of live and recently destroyed resources (textures, buffers, etc.) that are used to augment the GPU fault information in crash dumps.
This allows Aftermath to identify resources related to GPU virtual addresses seen in the case of a crash due to a GPU page fault. The resource information being tracked includes details about the size of the resource, its format, and the current deletion status of the resource object. D3D12 developers may also consider instrumenting their application using the GFSDK_Aftermath_DX12_RegisterResource function to register the D3D12 resources the application creates. That allows Aftermath to track additional information, such as the resources’ debug names set by the application. For Vulkan applications, the resources’ debug names set via vkSetDebugUtilsObjectNameEXT are also captured. For more detail on how to instrument an application with the Nsight Aftermath SDK, see the Nsight Aftermath SDK documentation.
Note
Enabling this feature causes additional driver overhead for tracking resource information.
Enable Call Stack Capturing enables the automatic generation of Aftermath event markers for tracking the origin of all draw calls, compute and ray tracing dispatches, ray tracing acceleration structure build operations, or resource copies initiated by the application. This data can augment the data collected via Aftermath user markers.
The automatic event markers are added into the command stream right after the corresponding commands with the CPU call stacks of the functions recording the commands as the marker data payloads.
Note
Enabling this feature causes considerable driver overhead for gathering the necessary information.
Note
When this feature is enabled, the GPU crash dump file may contain the file path for the crashing application’s executable as well as the file paths for all DLLs or DSOs it has loaded.
Enable Additional Shader Error Reporting puts the GPU in a special mode that allows the GPU to report additional runtime shader errors. This may provide additional information when debugging GPU hangs, GPU crashes, or unexpected behavior related to shader execution.
Enabling this feature may result in additional crash dumps reporting issues in shaders that exhibit undefined behavior or have hidden bugs, which so far went unnoticed because by default the hardware silently ignores them. The additional error checks that are enabled when using this option cause GPU exceptions for the following situations:
- Accessing memory using misaligned addresses, such as reading or writing a byte address that is not a multiple of the access size. 
- Accessing memory out-of-bounds, such as reading or writing beyond the declared bounds of (group) shared or thread local memory or reading from an out-of-bounds constant buffer address. 
- Accessing a texture with incompatible format or memory layout. 
- Hitting call stack limits. 
Note
This feature is only supported with NVIDIA graphics driver R515 or later.
System Settings
The System Settings page contains the systemwide Enable SM Register Data Collection setting. On Linux, the collection of SM registers is always enabled so the System Settings page won’t be present in the monitor. On Windows, the collection of SM registers is always enabled on R550 and later drivers, so the controls for SM register collection is disabled and grayed out.
Note
On Windows, modifying Aftermath system settings requires Windows Administrator privileges. Therefore, when any of these settings are modified and applied, a User Account Control confirmation window may pop up asking for permission to modify system settings.
 
Enable SM Register Data Collection to collect SM register values when faults happen inside SMs. This can provide additional information when debugging GPU crashes related to shader execution.
Since this is a system setting, modifying it might also affect other tools such as Nsight VSE CUDA debugger and may result in unexpected behavior. On Linux, this feature is always enabled without incompability with other tools.
Note
This feature is only supported for the D3D12 and Vulkan APIs with NVIDIA graphics driver R535 or later and requires Nsight Graphics Pro to visualize the data. Register inspection of internal shaders is not supported.
Command Line Settings
All Aftermath GPU crash dump monitor settings can also be configured through command line parameters.
The available command line flags are:
- –help Print a help message with a list of the available options. 
- –version Print the release version of the executable. 
- –crashdump-dir arg Set the crash dump directory. 
- –debuginfo-dir arg Set the debug information dump directory. 
- –prompt-on-crash Prompt to open Nsight Graphics after a crash is generated. 
- –hostname arg The host name of the machine on which to look for already running Nsight Graphics instances. 
All Aftermath settings and System settings can also be configured through a separate command line tool installed next to the crash dump monitor executable. On Windows, that command line tool is nv-aftermath-control.exe. On Linux, the tool is called nv-aftermath-control.bin.
The command line flags supported by the configuration tool are:
- –mode arg Set Nsight Aftermath mode. Supported options for arg are: Disabled, Whitelist, or Global. 
- –whitelist arg Add application to the Nsight Aftermath whitelist. arg must be of the following form: - ApplicationName MyApp ExecutableName myApp.exe - This option can be repeated to add multiple applications to the whitelist. This option also clears a previously set up whitelist. 
- –debuginfo Generate NVIDIA shader debug information. 
- –resource-tracking Enable resource tracking. 
- –callstacks Enable automatic marker generation with call stack capturing. 
- –shader-error-reporting Enable additional shader error reporting. 
- –register-data-collection Enable SM register data collection. Windows only. 
Note
On Windows, modifying Aftermath graphics driver settings requires Windows Administrator privileges. Therefore, when nv-aftermath-control.exe is executed, a User Account Control confirmation window may pop-up asking for permission to modify system settings.
New Crash Dump Notification Dialog
If the NVIDIA Nsight Aftermath Monitor is configured to prompt on new crash dumps, every time a new GPU crash dump file is stored to the crash dump directory, a notification dialog pops up indicating that a new GPU crash dump is available. This dialog shows the name generated for the new crash dump and also allows you to directly open it in a newly launched instance of Nsight Graphics or in an already running instance of Nsight Graphics.
 
New Crash Dump Notification Dialog
If the NVIDIA Nsight Aftermath Monitor is configured to prompt on new crash dumps, every time a new GPU crash dump file is stored to the crash dump directory, a notification dialog pops up indicating that a new GPU crash dump is available. This dialog shows the name generated for the new crash dump and also allows you to directly open it in a newly launched instance of Nsight Graphics or in an already running instance of Nsight Graphics.
 
GPU Crash Dump Inspector
The GPU Crash Dump Inspector window is comprised of two major views:
- In the left part of the window, there is a set of tabs that provide summary information for the open GPU crash dump file, as well as information about the captured crash. 
- In the right part of the window, there is a multi-purpose area that shows detailed information based on selections made in some of the sections of the left-side tabs. 
 
Dump Info
The Dump Info tab provides summary information for the open GPU crash dump file and the data contained in the dump. It is comprised of the following sections:
- The Dump Details section summarizes information about the GPU crash dump file, such as the file name, the date and time the dump was created, and the size of the file. 
- The Application section summarizes information about the application for which the GPU crash dump file was captured, like the name of the executable, the process identifier of the corresponding process, and which graphics API was used. 
- The Exception Summary section summarizes information about the reason for the GPU crash or GPU hang captured in the GPU crash dump file. The first section contains an analysis of any page faults or shader faults detected in the dump. This provides potential causes for these issues and includes links to any available resource information, shader locations, and relevant markers. Analysis for other crash reasons will be added in future updates. The second section shows what state the graphics adapter and D3D or Vulkan device were in when the device recovery was triggered (TDR). 
- The System Info section summarizes the information about the system on which the GPU crash dump file was captured. This includes information about the operating system, the graphics driver, and the GPU on which the has crash happened.   
Crash Info
The Crash Info tab provides detailed information for data captured in the open GPU crash dump file. The available sections vary based on the type of the crash and what information was captured into the crash dump.
- The Active Warps section, if available, shows all active shader executions at the time of the crash or hang. Each row shows the summary for all the warps executing at a specific shader address, including the number of warps, the type of the shader, the shader hash, and the corresponding location within the shader (if shader debug information is available). Clicking a row in the table opens the corresponding Shader View.   
- The Faulted Warps section, if available, shows all shader executions that have hit errors. Each row shows a summary of the fault hit on a specific shader address, including the error type, the type of the shader, the shader hash, and the corresponding location within the source shader (if shader debug information is available). Clicking a row in the table opens the corresponding Shader View.   
- The Active/Faulted Warps section, if available, shows expandable rows each representing the group of all the warps that were executing at a specific shader address. Rows are marked as faulted if any of the warps hit a fault. The Active Warps column shows the number of warps executing at the shader address in the GPU PC Address column. The rows can be expanded to show details about the individual warps. Each entry shows the fault status of the warp, a unique identifier for the warp, the type of the shader, the shader hash, and the source or IL location within the shader (if shader shader debug information is available). Clicking a row in the table or the address link in the Faulted column opens the corresponding Shader View. The warp information is populated according to the selection and if the warp faulted, additional information such as the name of the fault, its description, and the shader address at which the warp hit that fault is also shown. Clicking on a GPU PC Address or Shader Location link opens the Shader View.   
- The Page Fault section, if available, shows information about the GPU page fault that caused the crash. Besides the address of the page that could not be accessed, the type of the fault, the type of the access and the GPU unit from which the page was accessed.   
- The Page Fault Resource History section, if available, shows information about the resource that is mapped or was mapped at the GPU page fault address if the Aftermath resource tracking feature was enabled.   - Note - The full resource history feature is only available on R550 and later drivers. It also requires the application to integrate Aftermath SDK 2024.1 or later to properly track debug names and resource pointers. Otherwise, you may only have a debug name for the first resource and no resource pointer available. 
- The Resource Detail view shows detailed information about a resource selected by clicking a resource entry in the Page Fault Resource History section above. The view shows the resource’s debug name, the resource’s base GPU virtual address, the resource’s size, etc.   
- The Fault Info section, if available, shows extra information about the error that caused the crash. This section is shown instead of the Page Fault section for certain other error types.   
- The GPU State section shows a high-level summary of the state of various parts of the GPU. This can be helpful to track down which parts of the graphics pipeline were active or have faulted in the case of a crash.   
- The Aftermath Markers section, if available, shows a summary of the Aftermath event markers last processed by the GPU for each of the registered Aftermath contexts. For user event markers, clicking the Payload link in the table opens the corresponding Aftermath Markers View that allows you to inspect the user-provided marker payload. For automatic event markers, clicking the CallStack link opens an Aftermath Call Stack View showing the call stack of the function that recorded the corresponding graphics command into the D3Dcommand list or Vulkan command buffer. See also the event marker documentation in - GFSDK_Aftermath.hand the Nsight Aftermath SDK documentation for more detail.
Shader View
The Shader Source view shows the shader code related to the selection made in the Active Warps, Faulted Warps or the Active/Faulted Warps view. This requires that the appropriate information is made available by Configuring the GPU Crash Dump Inspector.
Depending on what information is available for the shader, the Language selection box provides the following options:
- If Source is selected, the view shows the high-level shader source of the shader corresponding to the row selected in the Active Warps, Faulted Warps or the Active/Faulted Warps view. If the shader was compiled from several source files, the File selection box allows you to switch between the source files. If a row in the Active Warps or the Active/Faulted Warps view is selected, the shader source line that was executing by the selected warp when the crash dump data was captured is marked with a yellow arrow. If that warp has faulted, a red circle marks the location of the fault. If a row in the Faulted Warps view is selected, the shader source line that corresponds to the faulted instruction is marked with the red circle. The yellow arrow and red circle buttons jump directly to the corresponding marked instructions.   
- If IL is selected, the view shows the intermediate assembly of the shader (DXIL or SPIR-V) corresponding to the row selected in the Active Warps, Faulted Warps or Active/Faulted Warps view. If a row in the Active Warps or the Active/Faulted Warps view is selected, the intermediate language statement that was executing by the selected warp when the crash dump data was captured is marked with a yellow arrow. If that warp has faulted, a red circle marks the location of the fault. If a row in the Faulted Warps view is selected, the intermediate language statement that corresponds to the faulted instruction is marked with the red circle. The yellow arrow and red circle buttons jump directly to the corresponding marked instructions.   
Aftermath Marker Data View
The Aftermath Marker Data view allows inspection of the Aftermath event marker data provided by the application. Since Aftermath event marker data is typeless the marker data view supports different Data view modes for interpretation of the raw data:
- As string interprets the event marker data as zero-terminated UTF-8 character string.   
- As wide string interprets the event marker data as zero-terminated wide character string.   
- Custom allows you to inspect the raw event marker byte data or to provide a custom interpretation of the data using a Structured Memory Configuration.   
Aftermath Marker Call Stack View
The Aftermath Marker Call Stack view shows the call stack for the last draw, dispatch, or copy call processed by the GPU. Resolving the call stack to source location requires a properly set up symbol search path in the Search Paths Settings. Alternatively, clicking the Unknown Symbol link allows to provide a symbol file for a specific call stack element.
 
Project Explorer
The Project Explorer offers a view of all data associated with the current project. It contains data files, sorted by the time of generation. Note that you may also include arbitrary links to other files as a useful aid in correlating data.
Items in the Project Explorer can be adjusted by right-clicking and selecting an option to interact with it. They may also be renamed by selecting the item and pressing F2.
In addition to navigating via the Project Explorer, you may wish to see the files that were recently generated. Load these through File > Recent Files, or File > Open File.
 
Options
The Options dialog, accessed via the Tools > Options… menu, allows you to configure Nsight Graphics in a number of different ways. Each section is detailed below. The options selected are persisted in user settings for the next time you run the tool.
Environment
 
On the Environment tab, select whether to use the light or dark theme, font selection, the default document folder for Nsight Graphics to use, and your preferred startup behavior.
GPU Trace
 
On the GPU Trace tab, you can change the time units and the time precision that are displayed in a GPU Trace report. You can change the grid density and the GPU bound threshold (which affects the GPU Bound calculation in the summary tab).
Shader Profiler
 
On the Shader Profiler tab, you can change the shader bytecode loading behavior, inactive shader displaying mode, and counter display mode.
- PDB when setting to - Yes, shader debug information is loaded from the PDB file, even if the PDB file doesn’t match the bytecode.
- Inactive Shaders controls if the shader source, live registers or scoreboard dependencies are loaded for inactive shaders (shaders that has 0 PC samples). Setting to - Nofor better performance of opening Shader Profiler reports.
- Counter Attribute Mode controls the ways to display counters. - Value Mode: - Relativemode displays counters as percentage.- Absolutemode displays counters as absolute values.
- Precision Mode: - Abbreviatedmode displays counters as human-readable numbers.- Fullmode displays the raw counter values. This option is only valid if Value Mode is set to- Absolute.
 
Search Paths
On the Search Paths tab, you can configure search path settings for shader and application debug files used by the NVIDIA Nsight Aftermath GPU Crash Dump Inspector, NVIDIA Nsight Graphics Frame Debugger, and other tools.
 
- Shader Source specifies a list of directories where shader source files can be found. This option is used to associate the high-level shader (HLSL, GLSL, etc.) source files that are used in your application to the file names that are embedded in shader objects by your shader compiler. In many cases, source files are already be embedded into the shader binaries, but in some cases, especially if the shader compilation tool-chain that is used has its own proprocessing steps, only source file information may be available. 
- Shader Binaries specifies a list of directories where pre-compiled binary shader objects (DirectX shader binary files generated by the HLSL compiler, SPIR-V shader binaries, etc.) can be found. For example, when opening an Nsight Aftermath GPU crash dump, those paths are searched for the shader binary matching the shader hash to access the DXIL/SPIR-V instructions or retrieve source mapping information. 
- Separate Shader Debug Information specifies a list of directories where shader debug information files separate from the shader binary can be found. These are the shader debug information files that may have been produced by your compiler toolchain when compiling the shader binaries ( - .lldor- .pdbfiles generated by- dxc.exefor instance).
- NVIDIA Shader Debug Information specifies a list of directories where NVIDIA shader debug information files can be found. These are the shader debug information files generated by the Nsight Aftermath GPU Crash Dump Monitor ( - .nvdbgfiles) or the files created based on the data provided by the shader debug information callback for applications that are instrumented with the GPU crash dump collection feature of the Nsight Aftermath SDK. Shader debug information is identified by a unique shader debug information identifier embedded into the crash dump file. The shader debug information identifier is used to search for the shader debug information file in the configured search paths when it is required for mapping shader microcode instructions of active or faulted shader warps to shader IL instructions or shader source lines.
- Application Debug Information specifies a list of directories where debug information (e.g., PDB files or shared object files with debug information) for the application that is analyzed and the dynamic libraries it has loaded can be found. This is necessary to resolve application call stacks in several views. 
- For all of the above paths, it is possible to recursively search the configured directories by enabling the Search sub-directories option that is associated with each. 
Injection
 
On the Injection tab, select whether to enable or disable debugging Steam overlay.
Frame Debugger (Host)
 
On the Frame Debugger tab, you can configure the time unit and precision settings for the host display, settings for C++ Capture, and set the timeout for a Pixel History.
Feedback
 
On the Feedback tab, choose whether or not you wish to allow Nsight Graphics to collect usage and platform data.
Common View Capabilities
Nsight Graphics supports docking multiple windows within the main window. Any window may be moved, adjusted, tabbed, or pulled out from the docking system that it provides. Most default layouts have multiple documents already specified, but if you wish to adjust these documents you can do so at any time.
Beyond positioning, when frame debugging or profiling, there are buttons that are common across several frame debugger views.
 
- The Clone button makes a copy of the current view, so that you can compare different parts of the API Inspector (or other cloned views) for the current action. 
- The Lock button freezes the current view so that changing the current event does not update this view. This is helpful when trying to compare the same state on two different actions. 
Window Layouts
Nsight Graphics has an intuitive window management and docking system. Windows may be dragged and docked in your preferred locations.
Floating Windows
Windows may be dragged out of the main Nsight Graphics window so that they can be independently managed. Other windows can then dock onto that floating window. On operating systems that support it, such as Microsoft Windows, windows may also be managed from the taskbar.
Pinned Windows
 
Select the pin icon to collapse the associated window to the side of the main window.
 
Once pinned, the item can be clicked to open the window for quick reference; clicks away collapse the pinned window to view other concerns.
User Named Layouts
Nsight Graphics allows you to customize the size and position of the views to create a layout that is targeted to the task at hand. For example, if you are focused on debugging a problem with API usage, you can put the Events View and API Inspector next to each other as you work your way through the frame, inspecting the API state at different points in the frame. The view locations are automatically saved when you exit the frame replay and restored when capture a new frame.
However, different problem types may require unique layouts. To facilitate a smooth transition from one layout to another, you can save and restore activity-centric view arrangements via user named layouts.
You can access this save/load capability from the Window pull down in the main menu. The section pertaining to layouts includes entries to “Save Window Layout…”, “Restore Window Layout”, “Manage Window Layouts…”, and restore the “Default Window Layout.”
 
“Save Window Layout…” brings up a dialog that allows you to specify a name for the current layout. The layouts are saved to a Layouts folder in the documents directory as named .nvlayout files and can be shared with colleagues.
 
Once you have saved a layout (or two), you can restore them by using the “Restore Window Layout” menu entry. When you mouse to it, a sub-menu pops out with all of your saved layouts. Simply select the entry you want and the views are restored to their original locations.
 
There may come a time when you want to clean up some unused layouts. When you select the “Manage Window Layouts” entry, a dialog comes up that allows you to delete or rename old layouts, etc.
 
Finally, the “Reset Window Layout” entry in this section allows you to restore the layout to the default one for the current activity.
Window Chooser
Nsight Graphics has a window chooser for fast enumeration and selection of opened documents and windows. To open this dialog, select Windows > Windows…
 
Once activated, a window chooser is brought up that contains all of the opened documents and windows.
 
Navigate with the mouse or keyboard to select an entry and press enter to activate the selected window or document. Alternatively you may double click an entry to activate it.
You may also enter filter expressions to filter to the window of interest.
 
Troubleshooting
Due to the complex nature of the underlying mechanisms that make arbitrary application analysis possible, there is the possibility of errors. Nsight Graphics offers a significant number of ways where you can discover opportunities to correct issues that you may encounter.
See the sections below for general tools as well as listings of common problems and possible solutions for them. Also, you may want to review known issues to determine if you are encountering an issue already known.
General Tools
This section provides troubleshooting tips for Nsight Graphics.
Output Messages
Throughout the operation of the tool, Nsight Graphics provides messages that inform on the status of operations, as well as if any issues are encountered. This could provide some assistance when trying to determine why your application may not run, connect, or capture correctly. Error messages are indicated by a red flag in the bottom right of the application window. This flag may be double-clicked to open the Output Messages window. Alternatively this window may be accessed via Tools > Output Messages.
 
Crash Reporting
When an application crashes, or hangs, a crash report can be one of the most valuable pieces of information in helping to fix the issue. Accordingly, if you have the ability to send a crash report, it is greatly appreciated.
Automatic Crash Reports
Nsight Graphics’ host and target are configured to automatically send crash reports when they encounter a crash. Submitting via the dialog is a good approach, but saving the minidump for explicit communication can be useful too.
Note
If you encounter a crash and do not have the option of sending a crash report, you may need to generate a crash report manually instead, as described below. One typical reason that crash reports might not be generated is if the application is configured with its own automated crash reporting that overrides the Nsight Graphics crash reporting mechanism.
Manual Crash Reports
Manual crash reports are an effective approach to collecting information in case you are finding that automatic crash reports are not triggering. A process dump is collected by attaching to the crashing process with a debugger and manually creating a dump in the case of a crash.
Windows
A crash dump can be created by Microsoft Visual Studio. To accomplish this:
- Start Visual Studio. 
- Follow the instructions for Debugging Your Application with a Debugger. 
- Start the application with Nsight Graphics. 
- Attach the Visual Studio debugger to it with “Debug > Attach To Process” 
- When you encounter a crash, use the Visual Studio “Debug > Save Dump As” menu option. 
Linux
A crash dump can be created by GDB, the GNU Debugger. To accomplish this:
- Start gdb. 
- Follow the instructions for Debugging Your Application with a Debugger. 
- Start the application with Nsight Graphics. 
- Attach gdb to it. 
- When you encounter a crash, use the “generate-core-file” command. 
- Next, while the process is still alive, use the core2md utility to translate the core file into a dump that can be consued by running: - core2md <core dump> /proc/<crash process ID>/ <minidump>- The core2md utility can be found in the Nsight Graphics installation directory under - host/linux-desktop-nomad-x64.
 
Manual Hang Reports
If the application encounters a hang, a process dump can be one of the most effective ways to identify the source of the hang. A process dump is collected by attaching to the crashing process with a debugger and manually creating a dump by following the instructions below:
Windows
A crash dump can be created by Microsoft Visual Studio. To accomplish this:
- Start Visual Studio. 
- Attach the Visual Studio debugger to the hanging process with “Debug > Attach to Process” 
- Stop program execution by using the Visual Studio “Debug > Break All” command. 
- Generate a process dump using the Visual Studio “Debug > Save Dump As” command. 
Linux
A crash dump can be created by GDB, the GNU Debugger. To accomplish this:
- Start gdb. 
- Attach gdb to your process. 
- The process should stop after being attached to by GDB, otherwise press Ctrl + C to send a SIGINT to stop the process. 
- Generate a process dump using the “generate-core-file” command. 
- Next, while the process is still alive, use the core2md utility to translate the core file into a dump that can be consued by running: - core2md <core dump> /proc/<hang process ID>/ <minidump>- The core2md utility can be found in the Nsight Graphics installation directory under - host/linux-desktop-nomad-x64.
 
Debugging Your Application with a Debugger
Although launching your application with Nsight Graphics might appear to be an alternative to CPU debugging, the application that is launched is still very much a debuggable application. This can be useful to determine if a problem you are encountering is in your own code by tracing the paths taken by your application.
To do this, set an environment variable of NVIDIA_PROCESS_INJECTION_ATTACH_DIALOG=1 and attach a debugger when you see a message box. Click OK to resume your application once you have set breakpoints that will allow you to inspect if your application is following the expected paths.
Collecting DirectX Debug Logging
Sometimes a device lost or other issue can be narrowed by observing what the DirectX debug layer has to say.
If you need to install the layer, it should be part of the OS in Windows 10:
Apps & Features > Manage Optional Features > Graphics Tools
Open dxcpl, which should look like the below. Make sure your installed application is in the Scope List, and set the Direct3D/DXGI Debug Layer to Force On.
 
There are two ways to see the spew:
- You can see logging without attaching Visual Studio by just running - DbgView.exe(https://docs.microsoft.com/en-us/sysinternals/downloads/debugview).
- Alternately, attach using Visual Studio. Logging will be in the Visual Studio Output window. 
Setting an environment variable
There are occasionally times where you might be asked to set an undocumented variable to help disambiguate problems.
Apply the environment variable in the connection dialog Environment setting when starting an application.
Common Problems
Problem: Application Fails to Launch
You’ve tried to launch your application, but it is failing to launch.
Possible Causes
- Incorrect command line arguments. 
- Incorrect working directory. 
- You’re trying to launch on a remote machine that does not have the Nsight Monitor running. 
Possible Solutions
Make sure that your command line arguments and working directory are as expected.
If you are trying to run on a remote machine, please ensure that the remote monitor is running and that the name of the machine is correct. See Remote Launching.
Disambiguate if the application is launching at all. Follow the instructions in Debugging Your Application with a Debugger. Check to see if your application is launched at all and if so, whether it is following its expected path. If the application doesn’t launch at all, please send an email to devtools-support@nvidia.com.
Problem: Application Crashes at Runtime
You’ve found that your application appears to launch, but it crashes during runtime.
Possible Causes
- Lack of API support by Nsight Graphics 
- Application not checking return codes from device/object creation, assuming it is successful 
- Interception-library crash 
- Internal-driver crash 
- D3D-debug runtime interaction 
Possible Solutions
Try disabling the following features:
For D3D apps, try running without the D3D debug runtime enabled, as the debug runtime occasionally differs in behavior when compared with the release runtime.
If none of the above works, please try to collect a crash dump if possible and send it to devtools-support@nvidia.com.
Problem: Application Hangs at Runtime
You’ve found that your application appears to launch, but it hangs during runtime.
Possible Causes
- Multi-threading issue 
- HUD Issue 
Possible Solutions
Try disabling the following features:
If none of the above works, please try to collect a process dump if possible and send it to devtools-support@nvidia.com.
Problem: Application Crashes during Capture
You’ve found that you’re able to run the application successfully, but upon trying to perform a live analysis, the application crashes.
Possible Causes
- Multi-threading issue 
- Out of memory 
- The application is tearing itself down due to a watchdog timeout 
Possible Solutions
Try disabling the following features:
If you suspect a multi-threading issue (D3D’s runtime sometimes indicates this), try disabling multi-threaded capture.
If Nsight Graphics reports out of memory, try reducing the requirements of the application or try running with a more capable GPU.
If the application exits without any clear sign of a crash, the application could be tearing itself down. Please contact devtools-support@nvidia.com with your concern and we will investigate if there is any opportunity for deactivating the thread.
Problem: Application Hangs during Capture
You’ve found that you’re able to run the application successfully, but upon trying to perform a live analysis, the application hangs. This hang sometimes appears as a white screen on the target application.
Possible Causes
- The application is lazily presenting frames, preventing progress 
- Multi-threading issue 
- App is running in fullscreen mode 
Possible Solutions
If the application is lazily presenting frames, it may prevent capture progress given that Nsight performs work on frame boundaries. If this is the case, try turning on the Force Repaint feature so as to force the application to present frames.
If you suspect a multi-threading issue, try changing the following feature to RenderOnly:
If none of the above works, please try to collect a process dump if possible and send it to devtools-support@nvidia.com.
Problem: Application Encounters an Incompatibility
This problem arises when the application you are running is using API methods or patterns that are not supported by Nsight Graphics.
Possible Causes
- An unsupported API method was used 
- An unsupported API pattern was used 
Possible Solutions
When encountering this issue, Nsight Graphics presents a list of API methods or reasons that it has encountered as incompatible. This listing is listed alongside an explanation of the reasons why Nsight Graphics has prevented capture, which include application crashes, incorrect data, etc. Because Nsight Graphics is a replay-based debugger, the absence of methods may lead to critical issues as replay is attempted. In some cases, however, the missing methods are innocuous and replay may proceed correctly without them. When capturing through the host, Nsight Graphics offers the user an opportunity to capture despite these incompatibilities. From this point, it is up to the user to determine if the data is meaningful.
When encountering an incompatibility, we recommend that you communicate this incompatibility to devtools-support@nvidia.com so that the Nsight Graphics development team may track this issue and determine if it is something that will be supported in the future.
Note that if you wish to ignore all incompatibilities on every run, and wish to accept the possible errors that come with it, you may set the option of ‘Troubleshooting > Ignore Incompatibilities’ to accomplish that.
Problem: Application Captures Successfully, but Exits after a Time in Capture
This problem indicates that you have had some level of success, but even if the application generally inactive, the application crashes.
Possible Causes
- Serving a host query leads to a crash 
- Memory leak 
- Watchdog timer 
Possible Solutions
When encountering this issue, take note of what you are doing when you encounter it. The first thing to try is doing nothing – does the application still crash when doing so? If there is nothing going on, this is either a memory leak or a watchdog timer.
- Look at the memory usage of the process – is it growing? It’s a memory leak, either from the application or the tool. 
- Set a stopwatch to count how long it takes to crash – is it a “round” number like 30 or 60 seconds? It’s probably a watchdog. 
- If this is a memory leak (uncommon but possible) please contact support to help identify the issue. 
If this is a watchdog issue, disable the watchdog in your application.
Problem: Application Runs Extremely Slowly
You’ve observed that the application runs at a significantly lower rate than normal operation.
Possible Causes
- Too much work is being done. 
- The application may be exercising uncommon paths. 
Possible Solutions
Try disabling optional features, such as collecting shader source, collecting native shaders, or collecting hardware performance metrics.
Problem: D3D12 Replayer Shows More CPU Overhead than Expected
If you encounter more overhead in your generated C++ capture, conservative synchronization may be the problem.
Possible Causes
- Nsight Graphics’ default fence syncing policy may be too conservative for this application. 
Possible Solutions
Try experimenting with replay fence behavior.
Problem: I Can’t Capture a Vulkan Application
If you find that the button to Capture for Live Analysis is disabled, or you do not see a message that your application has Nsight Graphics analysis enabled, the Nsight Graphics Vulkan layer may not be enabled. This symptom is often accompanied by an error in the Output Messages window, so look for errors in that window for an indication of the failure.
Possible Causes
- The Nsight Graphics Vulkan layer configuration has been removed from your system configuration. 
Possible Solutions
One workaround is to re-enable the Nsight Graphics Vulkan layer explicitly. To do this, follow the steps for your system:
Windows
- Run <install directory>/host/windows-desktop-nomad-x64/VK_LAYER_NV_nomad.bat 
Linux
- Check the existence of vulkan manifest file (.json) under <install directory>/target/linux-desktop-nomad-x64/NomadVulkanLayer/vulkan/implicit_layer.d/. 
- Delete any dangling manifest of uninstalled Nsight Graphics in - ~/.local/share/vulkan/implicit_layer.d.
- If you still see the issue after above steps, then it could be an Nsight Graphics bug and please report it to us. Meanwhile, one possible workaround is to add environment variable at the “Start Activity” dialog: XDG_DATA_DIRS=<install directory>/target/linux-desktop-nomad-x64/NomadVulkanLayer 
If, after repeating the steps, you find that your system still cannot capture, gather a log of the output of the vulkaninfo application from the Vulkan SDK and send it to devtools-support@nvidia.com.
Problem: I Can’t Attach to the Application
The application launches, but you are unable to attach to it with the Nsight Graphics host.
Possible Causes
- You launched a piece of the process hierarchy without Nsight Graphics. 
- You set the connection to automatically attach when the root application launches child processes that are the actual processes of interest. 
- The application is interfering with the interception of Nsight Graphics, preventing it from intercepting. 
- The application is using a software renderer. 
Possible Solutions
Nsight Graphics is essentially an in-process debugger, so it cannot attach to an application that wasn’t originally launched through Nsight Graphics. The attach feature is meant to be used to attach to applications that have been launched through other means (e.g., a command line launcher), as well as to allow for some recoverability in the case of a host issue, as it allows you attach at a later time.
Make sure to kill any processes related to the process hierarchy of an application and try to launch it again.
Problem: The Host UI Crashes
The host UI crashes while you are analyzing an application.
Possible Causes
- UI Bug 
Possible Solutions
Try reducing the number of views that you have open when running to pinpoint which view causes the issue.
If at all possible, try to collect a crash dump of the UI application and send it to devtools-support@nvidia.com.
Try deleting the UI persistence data with Help > Reset Application Data.
Problem: The Target Window Blocks the Host Window
While running a live analysis, you find that the target window is blocking the host window and interfering with the analysis you wish to perform on the host. This is most often reported on machines that do not have access to multiple monitors.
Possible Causes
- The application has fullscreen settings. 
- The application has a topmost flag set to keep the application on top. 
Possible Solutions
We suggest running without fullscreen or topmost settings. If fullscreen-like behavior is desired, many applications support a borderless window mode.
If you must analyze an application with these characteristics, and you do not have access to a second monitor, the virtual desktop or workspaces support on most modern operating system shells presents an effective path forward. Creating one desktop for the target application and one for the host often avoids the target from interfering. For more information on using these features, see one of the articles below.
Note
If you wish to suppress the dialog that reports replay window interference, set an environment variable of NSIGHT_REPORT_REPLAY_WINDOW_INTERFERENCE=0 .
Linux/Gnome: https://help.gnome.org/users/gnome-help/stable/shell-workspaces.html.en
Problem: Force-failed QueryInterface is Reported
It is possible that applications attempts to QueryInterface for types that Nsight Graphics does not know about or understand. To avoid crashes, incorrect rendering, or bad data with these unknown types, Nsight Graphics reports a force-failed QueryInterface warning. After reporting this warning, Nsight Graphics nullifies the result of this QueryInterface call and return E_NOINTERFACE to report that this interface is unsupported.
Possible Causes
- Using an older version of Nsight Graphics against an application that uses newer runtime capabilities. 
- Using multiple tools that intercept the application at one time. 
- Lack of API support by Nsight Graphics. 
Possible Solutions
When this issue is encountered, it is recommended that you first attempt to understand what the source of the incompatibility is. Nsight Graphics attempts to print out the source and target types in the QueryInterface call. When the target is unknown, however, this type is printed out as a GUID.
In some cases, the failure may be apparent, and you might be able to do a text search to determine where your application is making this problematic QueryInterface call. If that is too difficult to find, you may also try Debugging Your Application with a Debugger and setting a function breakpoint on MessageBoxA before running the application, which reports the call stack in which Nsight Graphics performs the report.
If you are unable to workaround this type support, you may attempt to set an environment variable to suppress this force-failed query. Note that this is not guaranteed to fix all concerns, and may result in future unspecified failures, but it is available as a possibility for working around problems. The environment variable of NSIGHT_PASSTHROUGH_UNKNOWN_GUIDS is a comma-delimited list of GUIDs to allow to passthrough without a force-failure. GUIDs must be fully specified with a brace syntax, as in NSIGHT_PASSTHROUGH_UNKNOWN_GUIDS={5b746c30-24e2-4385-81f6-39f7a068945b}.
If you suspect that the type being reported should be supported by Nsight, please send a report to devtools-support@nvidia.com to ask for assistance.
Appendix
Feature Support Matrix
| Feature | D3D11 | D3D12 | OpenGL | Vulkan | 
|---|---|---|---|---|
| Frame Capture and Live Analysis | Yes | Yes | Yes | Yes | 
| C++ Capture | Yes | Yes | Yes | Yes | 
| Shader Profiling | Yes | Yes | ||
| Pixel History | Yes | Yes | Yes | Yes | 
| Dynamic Shader Editing | Yes | Yes | Yes | Yes | 
| GPU Trace | Yes | Yes | Yes | |
| Ray Tracing Debugging | Yes | Yes | ||
| Nsight Aftermath GPU Crash Dumps | Yes | Yes | 
Supported OpenGL Functions
Nsight Graphics’ Frame Debugger supports the set of OpenGL operations, which are defined by the OpenGL 4.5 core profile. Note that it is not necessary to create a core profile context to make use of the frame debugger. An application that uses a compatibility profile context, but restricts itself to using the OpenGL 4.5 core subset, also works. A few OpenGL 4.5 compatibility profile features, such as support for alpha testing and a default vertex array object, are also supported.
The Frame Debugger supports three classes of OpenGL extensions, described below.
1. OpenGL Core Context Support
The OpenGL extensions listed below are supported in as much as the extension has been adopted by the OpenGL 4.5 core profile. For example, EXT_subtexture is included as part of OpenGL 1.1. Calls to glTexSubImage2DEXT are supported and behave the same as calls to glTexSubImage2D. On the other hand, while EXT_vertex_array is also included as part of OpenGL 1.1, glColorPointerEXT is not supported by the Frame Debugger. The operation of glColorPointerEXT was modified when it was included as part of OpenGL 1.1. Additionally, glColorPointer is part of the compatibility subset, but not the core subset.
// GL 1.1
EXT_vertex_array
EXT_polygon_offset
EXT_blend_logic_op
EXT_texture
EXT_copy_texture
EXT_subtexture
EXT_texture_object
// GL 1.2
EXT_texture3D
EXT_bgra
EXT_packed_pixels
EXT_rescale_normal
EXT_separate_specular_color
SGIS_texture_edge_clamp
SGIS_texture_lod
EXT_draw_range_elements
EXT_color_table
EXT_color_subtable
EXT_convolutionHP_convolution_border_modes
SGI_color_matrix
EXT_histogram
EXT_blend_color
EXT_blend_minmax
EXT_blend_subtract
// GL 1.2.1
EXT_SGIS_multitexture
// GL 1.3
ARB_texture_compression
ARB_texture_cube_map
ARB_multisample
ARB_multitexture
ARB_texture_env_add
ARB_texture_env_combine
ARB_texture_env_dot3
ARB_texture_border_clamp
ARB_transpose_matrix
// GL 1.4
SGIS_generate_mipmap
NV_blend_square
ARB_depth_texture
ARB_shadow
EXT_fog_coord
EXT_multi_draw_arrays
ARB_point_arameters
EXT_secondary_color
EXT_blend_func_separate
EXT_stencil_wrap
EXT_texture_env_crossbar
EXT_texture_lod_bias
ARB_texture_mirrored_repeat
ARB_window_pos
// GL 1.5
ARB_vertex_buffer_object
ARB_occlusion_query
EXT_shadow_funcs
// GL 2.0
ARB_shader_objects
ARB_vertex_shader
ARB_fragment_shader
ARB_draw_buffers
ARB_texture_non_power_of_two
ARB_point_sprite
EXT_blend_equation_separate
ATI_separate_stencil
EXT_stencil_two_side
// GL 2.1
ARB_pixel_buffer_object
EXT_direct_state_access
EXT_texture_sRGB
// GL 3.0
EXT_gpu_shader4
NV_conditional_render
APPLE_flush_buffer_range
ARB_color_buffer_float
NV_depth_buffer_float
ARB_texture_float
EXT_packed_float
EXT_texture_shared_exponent
EXT_framebuffer_object
NV_half_float
ARB_half_float_pixel
EXT_framebuffer_multisample
EXT_framebuffer_blit
EXT_texture_integer
EXT_texture_array
EXT_packed_depth_stencil
EXT_draw_buffers2
EXT_texture_compression_rgtc
EXT_transform_feedback
APPLE_vertex_array_object
EXT_framebuffer_sRGB
// GL 3.1
EXT_draw_instanced
ARB_draw_instanced
ARB_copy_buffer
NV_primitive_restart
ARB_texture_buffer_object
ARB_texture_rectangle
ARB_uniform_buffer_object
// GL 3.2
ARB_vertex_array_bgra
ARB_draw_elements_base_vertex
ARB_fragment_coord_conventions
ARB_provoking_vertex
ARB_seamless_cube_map
ARB_texture_multisample
ARB_depth_clamp
ARB_geometry_shader_4
ARB_sync
// GL 3.3
ARB_shader_bit_encoding
ARB_blend_func_extended
ARB_explicit_attrib_location
ARB_occlusion_query2
ARB_sampler_objects
ARB_texture_rgb10_a2ui
ARB_texture_swizzle
ARB_timer_query
ARB_instanced_arrays
ARB_vertex_type_2_10_10_10_rev
// GL 4.0
ARB_texture_query_lod
ARB_draw_buffers_blend
ARB_draw_indirect
ARB_gpu_shader5
ARB_gpu_shader_fp64
ARB_sample_shading
ARB_shader_subroutine
ARB_tessellation_shader
ARB_texture_buffer_object_rgb32
ARB_texture_cube_map_array
ARB_texture_gather
ARB_transform_feedback2
ARB_transform_feedback3
// GL 4.1
ARB_ES2_compatibility
ARB_get_program_binary
ARB_separate_shader_objects
ARB_shader_precision
ARB_vertex_attrib_64bit
ARB_viewport_array
// GL 4.2
ARB_texture_compression_bptc
ARB_compressed_texture_pixel_storage
ARB_shader_atomic_counters
ARB_texture_storage
ARB_transform_feedback_instanced
ARB_base_instance
ARB_shader_image_load_store
ARB_conservative_depth
ARB_shading_language_420pack
ARB_internalformat_query
ARB_map_buffer_alignment
// GL 4.3
ARB_multi_draw_indirect
ARB_program_interface_query
ARB_shader_storage_buffer_object
ARB_copy_image
ARB_vertex_attrib_binding
ARB_texture_view
ARB_invalidate_subdata
ARB_framebuffer_no_attachments
ARB_stencil_texturing
ARB_explicit_uniform_location
ARB_texture_storage_multisample
ARB_program_interface_query
ARB_robust_buffer_access_behavior
ARB_ES3_compatibility
ARB_clear_buffer_object
ARB_internal_format_query2
ARB_texture_buffer_range
ARB_compute_shader
ARB_debug_group
ARB_debug_label
ARB_debug_output
// GL 4.4
ARB_query_buffer_object
ARB_enhanced_layouts
ARB_multi_bind
ARB_vertex_type_10f_11f_11f_rev
ARB_texture_mirror_clamp_to_edge
ARB_clear_texture
// GL 4.5
ARB_clip_control
ARB_cull_distance
ARB_conditional_render_inverted
GL_KHR_context_flush_control
ARB_get_texture_sub_image
GL_KHR_robustness
ARB_texture_barrier
ARB_ES3_1_compatibility
ARB_direct_state_access
ARB_shader_texture_image_samples
ARB_derivative_control
2. Other Supported OpenGL Extensions
The second class of OpenGL extensions is listed below. These extensions are not part of OpenGL 4.5 core or compatibility, but are fully supported by the Frame Debugger target. Context and object state, which is added by these extensions, may not be displayed by the host UI.
ARB_framebuffer_object
EXT_texture_filter_anisotropic
NV_buffer_store
ARB_vertex_attrib_binding
ARB_multi_draw_indirect
NV_gpu_multicast
ARB_parallel_shader_compile
ARB_seamless_cubemap_per_texture
NV_shader_buffer_load
NV_vertex_buffer_unified_memory
3. Partially Supported OpenGL Extensions
The third class of OpenGL extensions are ones for which there is partial support. These extensions are listed below.
ARB_bindless_texture
WGL_ARB_extensions_string
WGL_ARB_pixel_format
WGL_EXT_extensions_string
WGL_EXT_swap_control
WGL_EXT_swap_control_tear
WGL_ARB_create_context
4. OpenGL Immediate Mode
Beyond the core functionality and extensions, a selection of immediate-mode functions is supported.
glBegin
glEnd
glVertex*
glColor*
glIndex*
glNormal*
glTexCoord*
glDrawElement
glEnableClientState
glDisableClientState
glVertexPointer
glColorPointer
glSecondaryColorPointer
glIndexPointer
glNormalPointer
Supported OpenXR Functions
Nsight Graphics™ 2025.4 frame debugging supports all of OpenXR 1.0.32.
Additionally, the follow extensions to OpenXR 1.0.32 are supported:
XR_KHR_D3D12_enable
XR_KHR_opengl_enable
XR_KHR_vulkan_enable
XR_KHR_vulkan_enable2
The follow extensions to OpenXR 1.0.32 are not currently supported. If your application uses these extensions, please send a feedback feature request to let the Nsight team know about your interest and needs.
XR_EXTX_overlay
XR_EXT_active_action_set_priority
XR_EXT_conformance_automation
XR_EXT_debug_utils
XR_EXT_dpad_binding
XR_EXT_eye_gaze_interaction
XR_EXT_hand_interaction
XR_EXT_hand_joints_motion_range
XR_EXT_hand_tracking
XR_EXT_hand_tracking_data_source
XR_EXT_hp_mixed_reality_controller
XR_EXT_local_floor
XR_EXT_palm_pose
XR_EXT_performance_settings
XR_EXT_plane_detection
XR_EXT_samsung_odyssey_controller
XR_EXT_thermal_query
XR_EXT_uuid
XR_EXT_view_configuration_depth_range
XR_EXT_win32_appcontainer_compatible
XR_KHR_D3D11_enable
XR_KHR_android_create_instance
XR_KHR_android_surface_swapchain
XR_KHR_android_thread_settings
XR_KHR_binding_modification
XR_KHR_composition_layer_color_scale_bias
XR_KHR_composition_layer_cube
XR_KHR_composition_layer_cylinder
XR_KHR_composition_layer_depth
XR_KHR_composition_layer_equirect
XR_KHR_composition_layer_equirect2
XR_KHR_convert_timespec_time
XR_KHR_loader_init
XR_KHR_loader_init_android
XR_KHR_opengl_es_enable
XR_KHR_swapchain_usage_input_attachment_bit
XR_KHR_visibility_mask
XR_KHR_vulkan_swapchain_format_list
XR_KHR_win32_convert_performance_counter_time
Supported Vulkan Functions
Nsight Graphics™ 2025.4 frame debugging supports all of Vulkan 1.4.323.0.
Additionally, the following extensions to Vulkan 1.4.323.0 are supported:
VK_EXT_4444_formats
VK_EXT_acquire_xlib_display
VK_EXT_astc_decode_mode
VK_EXT_attachment_feedback_loop_dynamic_state
VK_EXT_attachment_feedback_loop_layout
VK_EXT_blend_operation_advanced
VK_EXT_border_color_swizzle
VK_EXT_buffer_device_address
VK_EXT_calibrated_timestamps
VK_EXT_color_write_enable
VK_EXT_conditional_rendering
VK_EXT_conservative_rasterization
VK_EXT_custom_border_color
VK_EXT_debug_marker
VK_EXT_debug_report
VK_EXT_debug_utils
VK_EXT_depth_bias_control
VK_EXT_depth_clamp_zero_one
VK_EXT_depth_clip_control
VK_EXT_depth_clip_enable
VK_EXT_depth_range_unrestricted
VK_EXT_descriptor_buffer
VK_EXT_descriptor_indexing
VK_EXT_device_address_binding_report
VK_EXT_device_fault
VK_EXT_device_generated_commands
VK_EXT_device_memory_report
VK_EXT_direct_mode_display
VK_EXT_discard_rectangles
VK_EXT_display_surface_counter
VK_EXT_dynamic_rendering_unused_attachments
VK_EXT_extended_dynamic_state
VK_EXT_extended_dynamic_state2
VK_EXT_extended_dynamic_state3
VK_EXT_external_memory_host
VK_EXT_filter_cubic
VK_EXT_fragment_density_map
VK_EXT_fragment_shader_interlock
VK_EXT_frame_boundary
VK_EXT_full_screen_exclusive
VK_EXT_global_priority
VK_EXT_graphics_pipeline_library
VK_EXT_hdr_metadata
VK_EXT_headless_surface
VK_EXT_host_image_copy
VK_EXT_host_query_reset
VK_EXT_image_2d_view_of_3d
VK_EXT_image_compression_control
VK_EXT_image_compression_control_swapchain
VK_EXT_image_robustness
VK_EXT_image_sliced_view_of_3d
VK_EXT_image_view_min_lod
VK_EXT_index_type_uint8
VK_EXT_inline_uniform_block
VK_EXT_legacy_dithering
VK_EXT_legacy_vertex_attributes
VK_EXT_line_rasterization
VK_EXT_load_store_op_none
VK_EXT_map_memory_placed
VK_EXT_memory_budget
VK_EXT_memory_priority
VK_EXT_mesh_shader
VK_EXT_multi_draw
VK_EXT_multisampled_render_to_single_sampled
VK_EXT_mutable_descriptor_type
VK_EXT_nested_command_buffer
VK_EXT_non_seamless_cube_map
VK_EXT_opacity_micromap
VK_EXT_pageable_device_local_memory
VK_EXT_pci_bus_info
VK_EXT_physical_device_drm
VK_EXT_pipeline_creation_cache_control
VK_EXT_pipeline_creation_feedback
VK_EXT_pipeline_library_group_handles
VK_EXT_pipeline_protected_access
VK_EXT_pipeline_robustness
VK_EXT_post_depth_coverage
VK_EXT_present_mode_fifo_latest_ready
VK_EXT_primitive_topology_list_restart
VK_EXT_primitives_generated_query
VK_EXT_private_data
VK_EXT_provoking_vertex
VK_EXT_queue_family_foreign
VK_EXT_rasterization_order_attachment_access
VK_EXT_rgba10x6_formats
VK_EXT_robustness2
VK_EXT_sample_locations
VK_EXT_sampler_filter_minmax
VK_EXT_scalar_block_layout
VK_EXT_separate_stencil_usage
VK_EXT_shader_atomic_float
VK_EXT_shader_atomic_float2
VK_EXT_shader_demote_to_helper_invocation
VK_EXT_shader_image_atomic_int64
VK_EXT_shader_module_identifier
VK_EXT_shader_object
VK_EXT_shader_replicated_composites
VK_EXT_shader_stencil_export
VK_EXT_shader_subgroup_ballot
VK_EXT_shader_subgroup_vote
VK_EXT_shader_tile_image
VK_EXT_shader_viewport_index_layer
VK_EXT_subgroup_size_control
VK_EXT_subpass_merge_feedback
VK_EXT_surface_maintenance1
VK_EXT_swapchain_colorspace
VK_EXT_swapchain_maintenance1
VK_EXT_texel_buffer_alignment
VK_EXT_texture_compression_astc_hdr
VK_EXT_tooling_info
VK_EXT_transform_feedback
VK_EXT_validation_cache
VK_EXT_validation_features
VK_EXT_validation_flags
VK_EXT_vertex_attribute_divisor
VK_EXT_vertex_input_dynamic_state
VK_EXT_ycbcr_2plane_444_formats
VK_EXT_ycbcr_image_arrays
VK_KHR_16bit_storage
VK_KHR_8bit_storage
VK_KHR_acceleration_structure
VK_KHR_android_surface
VK_KHR_bind_memory2
VK_KHR_buffer_device_address
VK_KHR_calibrated_timestamps
VK_KHR_compute_shader_derivatives
VK_KHR_cooperative_matrix
VK_KHR_copy_commands2
VK_KHR_create_renderpass2
VK_KHR_dedicated_allocation
VK_KHR_deferred_host_operations
VK_KHR_depth_clamp_zero_one
VK_KHR_depth_stencil_resolve
VK_KHR_descriptor_update_template
VK_KHR_device_group
VK_KHR_device_group_creation
VK_KHR_display
VK_KHR_display_swapchain
VK_KHR_draw_indirect_count
VK_KHR_driver_properties
VK_KHR_dynamic_rendering
VK_KHR_dynamic_rendering_local_read
VK_KHR_external_fence
VK_KHR_external_fence_capabilities
VK_KHR_external_fence_fd
VK_KHR_external_fence_win32
VK_KHR_external_memory
VK_KHR_external_memory_capabilities
VK_KHR_external_memory_fd
VK_KHR_external_memory_win32
VK_KHR_external_semaphore
VK_KHR_external_semaphore_capabilities
VK_KHR_external_semaphore_fd
VK_KHR_external_semaphore_win32
VK_KHR_format_feature_flags2
VK_KHR_fragment_shader_barycentric
VK_KHR_fragment_shading_rate
VK_KHR_get_display_properties2
VK_KHR_get_memory_requirements2
VK_KHR_get_physical_device_properties2
VK_KHR_get_surface_capabilities2
VK_KHR_global_priority
VK_KHR_image_format_list
VK_KHR_imageless_framebuffer
VK_KHR_incremental_present
VK_KHR_index_type_uint8
VK_KHR_line_rasterization
VK_KHR_load_store_op_none
VK_KHR_maintenance1
VK_KHR_maintenance2
VK_KHR_maintenance3
VK_KHR_maintenance4
VK_KHR_maintenance5
VK_KHR_maintenance6
VK_KHR_maintenance7
VK_KHR_maintenance8
VK_KHR_map_memory2
VK_KHR_multiview
VK_KHR_pipeline_executable_properties
VK_KHR_pipeline_library
VK_KHR_portability_enumeration
VK_KHR_present_id
VK_KHR_present_wait
VK_KHR_push_descriptor
VK_KHR_ray_query
VK_KHR_ray_tracing_maintenance1
VK_KHR_ray_tracing_pipeline
VK_KHR_ray_tracing_position_fetch
VK_KHR_relaxed_block_layout
VK_KHR_sampler_mirror_clamp_to_edge
VK_KHR_sampler_ycbcr_conversion
VK_KHR_separate_depth_stencil_layouts
VK_KHR_shader_atomic_int64
VK_KHR_shader_clock
VK_KHR_shader_draw_parameters
VK_KHR_shader_expect_assume
VK_KHR_shader_float16_int8
VK_KHR_shader_float_controls
VK_KHR_shader_float_controls2
VK_KHR_shader_integer_dot_product
VK_KHR_shader_maximal_reconvergence
VK_KHR_shader_non_semantic_info
VK_KHR_shader_quad_control
VK_KHR_shader_relaxed_extended_instruction
VK_KHR_shader_subgroup_extended_types
VK_KHR_shader_subgroup_rotate
VK_KHR_shader_subgroup_uniform_control_flow
VK_KHR_shader_terminate_invocation
VK_KHR_shared_presentable_image
VK_KHR_spirv_1_4
VK_KHR_storage_buffer_storage_class
VK_KHR_surface
VK_KHR_surface_protected_capabilities
VK_KHR_swapchain
VK_KHR_swapchain_mutable_format
VK_KHR_synchronization2
VK_KHR_timeline_semaphore
VK_KHR_uniform_buffer_standard_layout
VK_KHR_variable_pointers
VK_KHR_vertex_attribute_divisor
VK_KHR_video_decode_av1
VK_KHR_video_decode_h264
VK_KHR_video_decode_h265
VK_KHR_video_decode_queue
VK_KHR_video_encode_av1
VK_KHR_video_encode_h264
VK_KHR_video_encode_h265
VK_KHR_video_encode_queue
VK_KHR_video_maintenance1
VK_KHR_video_maintenance2
VK_KHR_video_queue
VK_KHR_vulkan_memory_model
VK_KHR_wayland_surface
VK_KHR_win32_keyed_mutex
VK_KHR_win32_surface
VK_KHR_workgroup_memory_explicit_layout
VK_KHR_xcb_surface
VK_KHR_xlib_surface
VK_KHR_zero_initialize_workgroup_memory
VK_NVX_binary_import
VK_NVX_image_view_handle
VK_NVX_multiview_per_view_attributes
VK_NV_acquire_winrt_display
VK_NV_clip_space_w_scaling
VK_NV_cluster_acceleration_structure
VK_NV_compute_shader_derivatives
VK_NV_cooperative_matrix
VK_NV_cooperative_vector
VK_NV_copy_memory_indirect
VK_NV_corner_sampled_image
VK_NV_coverage_reduction_mode
VK_NV_dedicated_allocation
VK_NV_dedicated_allocation_image_aliasing
VK_NV_descriptor_pool_overallocation
VK_NV_device_diagnostic_checkpoints
VK_NV_device_diagnostics_config
VK_NV_device_generated_commands
VK_NV_device_generated_commands_compute
VK_NV_extended_sparse_address_space
VK_NV_external_compute_queue
VK_NV_external_memory
VK_NV_external_memory_capabilities
VK_NV_external_memory_win32
VK_NV_fill_rectangle
VK_NV_fragment_coverage_to_color
VK_NV_fragment_shader_barycentric
VK_NV_fragment_shading_rate_enums
VK_NV_framebuffer_mixed_samples
VK_NV_geometry_shader_passthrough
VK_NV_glsl_shader
VK_NV_inherited_viewport_scissor
VK_NV_linear_color_attachment
VK_NV_low_latency
VK_NV_low_latency2
VK_NV_memory_decompression
VK_NV_mesh_shader
VK_NV_optical_flow
VK_NV_partitioned_acceleration_structure
VK_NV_present_barrier
VK_NV_raw_access_chains
VK_NV_ray_tracing
VK_NV_ray_tracing_invocation_reorder
VK_NV_ray_tracing_linear_swept_spheres
VK_NV_ray_tracing_motion_blur
VK_NV_ray_tracing_validation
VK_NV_representative_fragment_test
VK_NV_sample_mask_override_coverage
VK_NV_scissor_exclusive
VK_NV_shader_atomic_float16_vector
VK_NV_shader_image_footprint
VK_NV_shader_sm_builtins
VK_NV_shader_subgroup_partitioned
VK_NV_shading_rate_image
VK_NV_viewport_array2
VK_NV_viewport_swizzle
The following extensions to Vulkan 1.4.323.0 are not currently supported. If your application uses these extensions, please send a feedback feature request to let the Nsight Graphics team know about your interest and needs.
VK_EXT_acquire_drm_display
VK_EXT_application_parameters
VK_EXT_depth_clamp_control
VK_EXT_directfb_surface
VK_EXT_display_control
VK_EXT_external_memory_acquire_unmodified
VK_EXT_external_memory_dma_buf
VK_EXT_external_memory_metal
VK_EXT_fragment_density_map2
VK_EXT_fragment_density_map_offset
VK_EXT_global_priority_query
VK_EXT_image_drm_format_modifier
VK_EXT_layer_settings
VK_EXT_metal_objects
VK_EXT_metal_surface
VK_EXT_pipeline_properties
VK_EXT_shader_float8
VK_EXT_vertex_attribute_robustness
VK_EXT_zero_initialize_device_memory
VK_KHR_maintenance9
VK_KHR_object_refresh
VK_KHR_performance_query
VK_KHR_pipeline_binary
VK_KHR_portability_subset
VK_KHR_present_id2
VK_KHR_present_mode_fifo_latest_ready
VK_KHR_present_wait2
VK_KHR_robustness2
VK_KHR_shader_bfloat16
VK_KHR_surface_maintenance1
VK_KHR_swapchain_maintenance1
VK_KHR_unified_image_layouts
VK_KHR_video_decode_vp9
VK_KHR_video_encode_intra_refresh
VK_KHR_video_encode_quantization_map
VK_NV_command_buffer_inheritance
VK_NV_cooperative_matrix2
VK_NV_cuda_kernel_launch
VK_NV_displacement_micromap
VK_NV_display_stereo
VK_NV_external_memory_rdma
VK_NV_external_memory_sci_buf
VK_NV_external_sci_sync
VK_NV_external_sci_sync2
VK_NV_per_stage_descriptor_set
VK_NV_present_metering
VK_NV_private_vendor_info
VK_NV_win32_keyed_mutex
Supported NVAPI Functions
Nsight Graphics’ Frame Debugger supports a large set of NVAPI functions. The list of functions are the following:
NvAPI_GetErrorMessage
NvAPI_GetInterfaceVersionString
NvAPI_D3D11_AliasMSAATexture2DAsNonMSAA
NvAPI_D3D11_BeginUAVOverlap
NvAPI_D3D11_BeginUAVOverlapEx
NvAPI_D3D11_CreateCubinComputeShader
NvAPI_D3D11_CreateCubinComputeShaderWithName
NvAPI_D3D11_CreateDevice
NvAPI_D3D11_CreateDeviceAndSwapChain
NvAPI_D3D11_CreateDomainShaderEx
NvAPI_D3D11_CreateFastGeometryShader
NvAPI_D3D11_CreateFastGeometryShaderExplicit
NvAPI_D3D11_CreateGeometryShaderEx_2
NvAPI_D3D11_CreateHullShaderEx
NvAPI_D3D11_CreateMetaCommand
NvAPI_D3D11_CreatePixelShaderEx_2
NvAPI_D3D11_CreateRasterizerState
NvAPI_D3D11_CreateSamplerState
NvAPI_D3D11_CreateShadingRateResourceView
NvAPI_D3D11_CreateVertexShaderEx
NvAPI_D3D11_DecompressView
NvAPI_D3D11_EndUAVOverlap
NvAPI_D3D11_EnumerateMetaCommands
NvAPI_D3D11_ExecuteMetaCommand
NvAPI_D3D11_GetResourceHandle
NvAPI_D3D11_InitializeMetaCommand
NvAPI_D3D11_IsFatbinPTXSupported
NvAPI_D3D11_IsNvShaderExtnOpCodeSupported
NvAPI_D3D11_MultiDrawIndexedInstancedIndirect
NvAPI_D3D11_MultiDrawInstancedIndirect
NvAPI_D3D11_MultiGPU_GetCaps
NvAPI_D3D11_MultiGPU_Init
NvAPI_D3D11_RSGetPixelShadingRateSampleOrder
NvAPI_D3D11_RSSetExclusiveScissorRects
NvAPI_D3D11_RSSetPixelShadingRateSampleOrder
NvAPI_D3D11_RSSetShadingRateResourceView
NvAPI_D3D11_RSSetViewportsPixelShadingRates
NvAPI_D3D11_SetDepthBoundsTest
NvAPI_D3D11_SetNvShaderExtnSlot
NvAPI_D3D11_SetNvShaderExtnSlotLocalThread
NvAPI_D3D12_BuildRaytracingAccelerationStructureEx
NvAPI_D3D12_BuildRaytracingOpacityMicromapArray
NvAPI_D3D12_CopyTileMappings
NvAPI_D3D12_CreateCommittedResource
NvAPI_D3D12_CreateComputePipelineState
NvAPI_D3D12_CreateCubinComputeShader
NvAPI_D3D12_CreateCubinComputeShaderWithName
NvAPI_D3D12_CreateDDisplayPresentBarrierClient
NvAPI_D3D12_CreateGraphicsPipelineState
NvAPI_D3D12_CreateHeap
NvAPI_D3D12_CreateHeap2
NvAPI_D3D12_CreateMetaCommand
NvAPI_D3D12_CreatePresentBarrierClient
NvAPI_D3D12_CreateReservedResource
NvAPI_D3D12_EmitRaytracingOpacityMicromapArrayPostbuildInfo
NvAPI_D3D12_EnumerateMetaCommands
NvAPI_D3D12_ExecuteMetaCommand
NvAPI_D3D12_GetGraphicsCapabilities
NvAPI_D3D12_GetNeedsAppFPBlendClamping
NvAPI_D3D12_GetRaytracingCaps
NvAPI_D3D12_InitializeMetaCommand
NvAPI_D3D12_IsFatbinPTXSupported
NvAPI_D3D12_IsNvShaderExtnOpCodeSupported
NvAPI_D3D12_NotifyOutOfBandCommandQueue
NvAPI_D3D12_QueryCpuVisibleVidmem
NvAPI_D3D12_QueryModifiedWSupport
NvAPI_D3D12_QueryPresentBarrierSupport
NvAPI_D3D12_QuerySinglePassStereoSupport
NvAPI_D3D12_RegisterPresentBarrierResources
NvAPI_D3D12_ReservedResourceGetDesc
NvAPI_D3D12_ResourceAliasingBarrier
NvAPI_D3D12_SetAsyncFrameMarker
NvAPI_D3D12_SetCreatePipelineStateOptions
NvAPI_D3D12_SetDepthBoundsTestValues
NvAPI_D3D12_SetModifiedWMode
NvAPI_D3D12_SetNvShaderExtnSlotSpace
NvAPI_D3D12_SetNvShaderExtnSlotSpaceLocalThread
NvAPI_D3D12_SetSinglePassStereoMode
NvAPI_D3D12_UpdateTileMappings
NvAPI_D3D1x_BindSwapBarrier
NvAPI_D3D1x_DisableShaderDiskCache
NvAPI_D3D1x_GetGraphicsCapabilities
NvAPI_D3D1x_JoinSwapGroup
NvAPI_D3D1x_Present
NvAPI_D3D1x_QueryFrameCount
NvAPI_D3D1x_QueryMaxSwapGroup
NvAPI_D3D1x_QuerySwapGroup
NvAPI_D3D1x_ResetFrameCount
NvAPI_D3D_BeginResourceRendering
NvAPI_D3D_ConfigureAnsel
NvAPI_D3D_EndResourceRendering
NvAPI_D3D_GetCurrentSLIState
NvAPI_D3D_GetLatency
NvAPI_D3D_GetObjectHandleForResource
NvAPI_D3D_GetSleepStatus
NvAPI_D3D_ImplicitSLIControl
NvAPI_D3D_InitializeSMPAssist
NvAPI_D3D_IsGSyncActive
NvAPI_D3D_IsGSyncCapable
NvAPI_D3D_QueryModifiedWSupport
NvAPI_D3D_QueryMultiViewSupport
NvAPI_D3D_QuerySMPAssistSupport
NvAPI_D3D_QuerySinglePassStereoSupport
NvAPI_D3D_RegisterDevice
NvAPI_D3D_SetFPSIndicatorState
NvAPI_D3D_SetLatencyMarker
NvAPI_D3D_SetModifiedWMode
NvAPI_D3D_SetMultiViewMode
NvAPI_D3D_SetReflexSync
NvAPI_D3D_SetResourceHint
NvAPI_D3D_SetSinglePassStereoMode
NvAPI_D3D_SetSleepMode
NvAPI_D3D_SetVerticalSyncMode
NvAPI_D3D_Sleep
NvAPI_DestroyPresentBarrierClient
NvAPI_JoinPresentBarrier
NvAPI_LeavePresentBarrier
NvAPI_QueryPresentBarrierFrameStatistics
NvAPI_DISP_AcquireDedicatedDisplay
NvAPI_DISP_GetNvManagedDedicatedDisplays
NvAPI_DISP_ReleaseDedicatedDisplay
NvAPI_OGL_ExpertModeDefaultsGet
NvAPI_OGL_ExpertModeDefaultsSet
NvAPI_OGL_ExpertModeGet
NvAPI_OGL_ExpertModeSet
Unsupported Captures
Nsight Graphics maintains a list of the unsupported functions or operations that are used by the application. If an unsupported operation is encountered, an unsupported capture is reported. This unsupported capture guards against crashes or incorrect results coming from known limitations.
However, in some cases these unsupported operations might not impact any analysis that follows. Accordingly, after warning about the risks of an unsupported capture, Nsight Graphics offers the opportunity to proceed despite this warning. If you proceed, Nsight Graphics continues into capture on a best-effort basis.
If you determine that this unsupported operation is innocuous, and you wish to turn it off completely, you may suppress this warning via the Ignore Incompatibilities option. Note that this prevents you from being notified of future incompatibilities, so please use with caution.
Update Notification
Nsight Graphics can check for a new version and notify the user of any updates. There are 2 options available for controlling this feature, found in the Environment tab of the Tools > Options view.
 
By default, Nsight Graphics checks for updates every time the app is started. This can be changed by selecting “No” for the “Check for updates at startup” option. With this option disabled, Nsight Graphics still checks for updates every 3 days.
Update notifications can be completely disabled by setting the “Show version update notifications” value to “No”.
If the automatic checking feature is disabled, the user can still check for updates by selecting the Help > Check for updates… menu option.
Microsoft Visual Studio Integration
NVIDIA Nsight Integration is a Visual Studio extension that allows you to access the power of Nsight Graphics from within Visual Studio.
When Nsight Graphics is installed along with NVIDIA Nsight Integration, Nsight Graphics activities appear under the Nsight menu in the Visual Studio menu bar. These activities launch Nsight Graphics with the current project settings and executable, allowing you to reuse all of the settings without manually copying any setting over. You can even set keybindings to launch sessions with a specified Nsight Graphics activity. When you use multiple Nsight tools, such as Nsight Systems or Nsight Compute, you see independent commands for each of them, greatly simplifying your workflow.
For more information about using Nsight Graphics from within Visual Studio, please visit:
 
 
 
 
 
