How To Setup and Inspect GPU Crash Dumps#
This section describes how to use the NVIDIA Nsight Aftermath Monitor to generate GPU crash dumps for applications using the Direct3D 12 or Vulkan API, and how to open and inspect those GPU crash dumps with the crash dump inspector plug-in in Nsight Graphics.
Alternatively, developers can also add GPU crash dump collection support into their graphics application using the NVIDIA Nsight Aftermath SDK.
Workflow#
The general workflow for working with Nsight Aftermath GPU crash dumps is to:
Run the NVIDIA Nsight Aftermath GPU Crash Dump Monitor.
Configure the desired GPU crash dump features.
Optional: if you want to collect additional information via event markers, you can optionally instrument the graphics application using the Nsight Aftermath SDK.
Run the graphics application for which to capture GPU crash dumps and reproduce the GPU crash or hang, allowing the monitor to collect the GPU crash dump.
Open the GPU crash dump in Nsight Graphics.
Configure GPU Crash Dump Inspector settings.
Inspect the crash dump data using the Nsight Graphics crash dump inspector.
See the sections below for details on each step of this process.
The GPU Crash Dump Monitor#
The NVIDIA Nsight Aftermath Crash Dump Monitor provides the means to capture GPU crash dump files for GPU crashes or GPU hangs, and to modify the driver configuration settings related to crash dump generation.
Running the GPU Crash Dump Monitor#
The NVIDIA Nsight Aftermath Monitor nv-aftermath-monitor.exe is installed to the Nsight Graphics host directory. Typically this is:
<install directory>\host\windows-desktop-nomad-x64
By default, the crash dump monitor application starts in the background. Its user interface is accessible through the NVIDA Nsight Aftermath Monitor icon in the Microsoft Windows system notification area (system tray).
Configuring the GPU Crash Dump Monitor#
All configuration options related to GPU crash dump creation are available through the GPU Crash Dump Monitor Settings dialog.
Set up the directory where crash dump files are stored.
Set up the directory where shader debug information files are stored.
Enable Aftermath GPU Crash Dump collection. Either set Aftermath mode to Global to enable crash dumps for all applications using the D3D12 or Vulkan API, or selectively enable it for one or more applications by managing an application Whitelist.
Enable the desired Aftermath graphics driver features:
Generate Shader Debug Information to generate shader debug information (line tables for mapping from the shader IL passed to the NVIDIA graphics driver to the shader microcode executed by the GPU) for all shaders loaded by the application.
The GPU Crash Dump Monitor stores the debug information into files with the
.nvdbgextension in the Debug Info Dump Directory directory configured in the General Settings Tab of the GPU Crash Dump Monitor Settings.The shader debug information is required for mapping shader microcode instructions of active or faulted shader warps to shader IL or shader source lines. Shader debug information is identified by a unique shader debug information identifier embedded into the crash dump file.
See also the section about Source Shader Debug Information for details on how to compile shader source with source-level debug information.
Enable Resource Tracking to enable additional driver-side tracking of live and recently destroyed resources.
This allows Nsight Aftermath to identify resources related to GPU virtual addresses seen in the case of a crash due to a GPU page fault. The resource information being tracked includes details about the size of the resource, its format, and the current deletion status of the resource object. D3D12 developers may also consider instrumenting the application using the
GFSDK_Aftermath_DX12_RegisterResourcefunction to register the D3D12 resources the application creates. That allows Nsight Aftermath to track additional information, such as the resources debug names set by the application. For Vulkan applications, the resources debug names set viavkSetDebugUtilsObjectNameEXTis captured too. For more details on how instrument and application with D3D12 resource tracking, see the Nsight Aftermath SDK documentation.Enable Call Stack Capturing to enable automatic generation of Aftermath event markers for tracking the origin of all draw calls, compute and ray tracing dispatches, ray tracing acceleration structure build operations, or resource copies initiated by the application.
The automatic event markers are added into the command stream right after the corresponding commands with the CPU call stacks of the functions recording the commands as the data payloads.
Note
Enabling this feature causes considerable driver overhead for gathering the necessary information.
Note
When this feature is enabled, the GPU crash dump file may contain the file path for the crashing application’s executable as well as the file paths for all DLLs or DSOs it has loaded.
Enable Shader Error Reporting to enable a special mode that allows the GPU to report additional runtime shader errors. This may provide additional information when debugging hangs, crashes, or unexpected behavior related to shader execution.
Enabling this feature may result in additional crash dumps reporting issues in shaders that exhibit undefined behavior or have hidden bugs, which so far went unnoticed because by default the hardware silently ignores them. The additional error checks that are enabled when using this option will cause GPU exceptions for the following situations:
Accessing memory using misaligned addresses, such as reading or writing a byte address that is not a multiple of the access size.
Accessing memory out-of-bounds, such as reading or writing beyond the declared bounds of (group) shared or thread local memory or reading from an out-of-bounds constant buffer address.
Accessing a texture with incompatible format or memory layout.
Hitting call stack limits.
Note
This feature is only supported with NVIDIA graphics driver R515 or later.
Note
On Windows, modifying the Nsight Aftermath graphics driver settings requires Windows Administrator privileges. Therefore, when any of these settings are modified and applied, a User Account Control confirmation window may pop up asking for permission to modify system settings.
Enable the desired Nsight Aftermath system-wide features:
Enable SM Register Data Collection to collect SM register values when faults happen inside SMs. This can provide additional information when debugging GPU crashes related to shader execution.
Since this is a system-wide setting, modifying it might also affect other tools such as Nsight VSE CUDA debugger and may result in unexpected behavior. On Linux, this feature is always enabled without incompability with other tools.
Note
This feature is only supported for the D3D12 and Vulkan APIs with NVIDIA graphics driver R535 or later and requires Nsight Graphics Pro to visualize the data. Starting with the R550 driver series, the SM register data collection feature is enabled by default and the setting is no longer available.
Note
On Windows, modifying the Nsight Aftermath system settings requires Windows Administrator privileges. Therefore, when any of these settings are modified and applied, a User Account Control confirmation window may pop up asking for permission to modify system settings.
The GPU Crash Dump Inspector#
The NVIDIA Nsight Aftermath Crash Dump Inspector provides the means to open, inspect, and analyze GPU crash dump files created by the NVIDIA Nsight Aftermath Monitor or the Nsight Aftermath SDK.
Loading GPU Crash Dump Files#
GPU crash dump files use the .nv-gpudmp file extension and can be loaded through File > Open File... This will bring up a GPU Crash Dump Inspector window displaying the crash dump file’s content.
Configuring the GPU Crash Dump Inspector#
In order to use all functionality provided by the GPU crash dump inspector, the following configuration settings should be made in the Search Paths Settings.
Add the directories where binary shader files (DXIL or SPIR-V shader files) are stored to Shader Binaries. If the binary shaders cannot be found, the Shader View is not able to display intermediate shader assembly code or shader source code.
For more information on how to generate these files, see Source Shader Debug Information.
Add the directories where the separate shader debug information files (
.lldor.pdbfiles generated bydxc.exefor instance) are stored to Separate Shader Debug Information. If the shader debug information cannot be found, the Shader View is not able to map GPU PC addresses of active or faulted warps to intermediate shader assembly or shader source code locations.For more information on how to generate these files, see Source Shader Debug Information.
Add the directories where the NVIDIA shader debug information files generated by the GPU crash dump monitor are stored to NVIDIA Shader Debug Information. If the NVIDIA shader debug information cannot be found, the Shader View is not able to map GPU PC addresses of active or faulted warps to intermediate shader assembly or shader source code locations.
Optionally, add the directories where shader source files are stored to Shader Source. Usually, this is not required as the shader debug information already includes the shader source. Only if a shader was compiled from source that contains references to other source files, for example via
#linedirectives, may it be necessary to specify additional source directories so that the Shader View can find the correct shader source.Add the directories where to find the symbol files for the graphics application for which the GPU crash dump has been captured to Application Debug Information. This allows the Aftermath Marker Call Stack View to resolve addresses to functions and source locations.
Inspecting GPU Crash Dump Files#
Use the GPU Crash Dump Inspector to analyze crash reasons. This is not an exhaustive tutorial on how to analyze GPU crash dumps, because every crash or hang is different, but it should provide some hints to get started.
After loading a crash dump file, it is usually a good start to check the Exception Summary on the Dump Info tab. This shows a high-level fault reason, e.g., whether the graphics device was hung or an error like a page fault has occurred. If there was a page fault or shader fault, this section contains an analysis that mentions potential causes and provides links to relevant information in the Crash Info tab and Shader View.
In case of a hang, it makes sense to check if there is an Active Warps section on the Dump Info tab showing shader activity. This could point toward an issue with very long-running shader warps or shader warps being stuck in an infinite loop. In that case, the Shader View may help to root cause the problem.
If the device state indicates there was a memory fault, the next step would be to look for a Page Fault section on the Dump Info tab. This may help to pinpoint problems with out-of-bounds resource access or accessing an already deleted resource.
If the application was instrumented with Aftermath Event markers, an Aftermath Markers section should be available on the Dump Info tab. This may help to pinpoint the draw or dispatch call that caused problems.
If Call Stack Capturing was enabled when capturing the GPU crash dump, Call Stack links should be available in the Aftermath Markers section, pointing to the draw, dispatch, or copy call that may be related to the problem.
Last, the GPU State section on the Dump Info tab may provide some hints about which parts of the graphics pipeline were active or have faulted when the crash occurred.
Instrumenting Applications with the Aftermath API#
The NVIDIA Nsight Aftermath SDK provides the Aftermath API that can be used by developers to instrument their applications. The latest version can be downloaded from https://developer.nvidia.com/nsight-aftermath.
By default, the the latest version of the SDK package available at the time of a Nsight Graphics release is installed together with Nsight Graphics in:
<install directory>\SDKs\NsightAftermathSDK
Detailed information about the functionality provided by the library and how to use it in an application can be found in the Readme.md that comes with the SDK package and the header files.
Nsight Aftermath Event Markers#
In D3D applications, the Aftermath event marker API (GFSDK_Aftermath_SetEventMarker) can be used to inject event markers with user-defined data directly into the graphics command stream. If the application is instrumented with event markers, information about the last event markers that were processed by the GPU for each command stream will be captured into the GPU crash dump, including the user provided event data. More information about Aftermath event markers and how to instrument an application to use them can be found in the Nsight Aftermath SDK documentation.
Note
Using event markers should be considered carefully. Injecting markers in high-frequency code paths can introduce high CPU overhead. Therefore, on some driver versions, the event marker feature is only available if the Nsight Aftermath GPU Crash Dump Monitor is running on the system. This requirement applies to R495 to R530 drivers for DX12 and R495+ drivers for DX11. No Aftermath configuration needs to be made in the Monitor. It serves only as a dongle to ensure Aftermath event markers do not impact application performance on end user systems.
Similar functionality is available for Vulkan applications with the VK_NV_device_diagnostic_checkpoints extension.
Source Shader Debug Information#
For mapping shader instruction addresses for active or faulted shader warps to high-level shader source, shaders need to be compiled with debug information. Since shader compilation is a two-step process — compilation from shader source, such as HLSL or GLSL, to an intermediate shader language representation, such as DXIL or SPIR-V, and graphics driver-level compilation of the shader IL to the actual microcode executed by the GPU — there are two levels of debug information required to accomplish such a mapping. This section describes how to compile shader source code with debug information suitable for consumption by the Aftermath GPU Crash Dump Inspector using the Microsoft DirectX Shader Compiler or the Vulkan SDK toolchain for shader compilation.
The generation of shader debug information for the microcode level needs to be enabled either through the Nsight Aftermath GPU Crash Dump Monitor settings or Aftermath feature flags when using the Nsight Aftermath SDK. For more information, see the Nsight Aftermath SDK documentation.
To enable shader instruction mapping when analyzing crash dumps in Nsight Graphics, the debug information must be made available by setting the Search Path Settings as described in Configuring the GPU Crash Dump Inspector.
For D3D12, the following variants of compiling shaders with debug information using the Microsoft DirectX Shader Compiler (dxc.exe) are supported by Nsight Aftermath:
Compile and use full shader blobs: Compile the shaders with debug information. Use the full (i.e., not stripped) shader binaries when running the application and make them accessible to Nsight Graphics when inspecting GPU crash dumps by adding the disk location where the compilation results are stored to the Shader Binaries search paths.
An example command line may look like this:
dxc -Zi [..] -Fo shader.bin shader.hlsl
Compile and strip: Compile the shaders with debug information, then strip off the debug information. Use the stripped shader binaries when running the application and make both stripped and not stripped files accessible to Nsight Graphics when inspecting GPU crash dumps. Add the disk location of the stripped files to the Shader Binaries search path and add the disk location of the not stripped files to the Separate Shader Debug Information search paths.
An example command line may look like this:
dxc -Zi [..] -Fo full_shader.bin shader.hlsl dxc -dumpbin -Qstrip_debug -Fo shader.bin full_shader.bin
Compile with separate debug information: Compile the shaders with debug information and instruct the compiler to store the debug meta data in separate shader debug information files (shader PDB files). Make both the shader binaries and the shader debug information files accessible to Nsight Graphics when inspecting GPU crash dumps. Add the disk location of the shader binaries to the Shader Binaries search path and add the disk location of the shader debug information files to the Separate Shader Debug Information search paths.
An example command line may look like this:
dxc -Zi [..] -Fo shader.bin -Fd debugInfo\ shader.hlsl
If the application compiles shaders on-the-fly, it needs to store the shader binary blobs to disk in a similar fashion so that they are accessible to Nsight Graphics when inspecting GPU crash dumps.
Note
No IL-level or source-level shader mapping is supported for DX bytecode shaders generated by the legacy Microsoft DirectX fxc.exe shader compiler.
For Vulkan, the following variants of generating SPIR-V shader code with debug information are supported by Aftermath:
Compilation using the glslangValidator tool of the Vulkan SDK’s shader compilation toolchain. An example command line may look like this:
glslangValidator -V -g -o shader.spv shader.vert
Compilation using the Microsoft DirectX Shader Compiler. An example command line may look like this:
dxc -spirv -Zi [..] -Fo shader.spv shader.hlsl
Use the full (i.e., not stripped) SPIR-V shader binaries when running the application and make them accessible to Nsight Graphics when inspecting GPU crash dumps by adding the disk location where they are stored to the Shader Binaries search paths.
Note
No source-level shader mapping is supported for pairs of stripped and not stripped SPIR-V files. Users interested in shader source mapping for applications shipping with stripped SPIR-V shaders may use the GPU crash dump decoding functionality provided by the Nsight Aftermath SDK and implement their own crash dump decoding tool.
Nsight Aftermath Shader Hashes#
There is no naming convention for shader files and developers can freely decide what file name and extension they use to store their DirectX shader binaries, separately stored “pdb” files, or SPIR-V shader files. Furthermore, the graphics drivers have no knowledge about those files. Therefore, Nsight Aftermath uses shader code hashes to identify the shader binaries loaded by the application.
When searching for the necessary information for showing DXIL/SPIR-V instructions or for source mapping information, the shader binaries found in the configured Shader Binaries search paths are compared against those hashes.
For developers who want to calculate the hashes for their files, the Nsight Aftermath SDK provides two APIs:
For D3D12/DXIL shaders, use the
GFSDK_Aftermath_GetShaderHashfunction.For Vulkan/SPIR-V shaders, use the
GFSDK_Aftermath_GetShaderHashSpirvfunction.
Both functions and additional information can be found in the GFSDK_Aftermath_GpuCrashDumpDecoding.h header file or the Nsight Aftermath SDK documentation.
Nsight Aftermath Shader Debug Information Identifiers#
Nsight Aftermath uses unique identifiers to identify the low-level debug shader debug information the NVIDIA D3D12 or Vulkan driver generates for mapping shader microcode instructions to DXIL or SPIR-V instructions.
When searching for the necessary information for mapping microcode instructions to DXIL or SPIR-V instructions, the debug information files found in the configured NVIDIA Shader Debug Information search paths are compared against the shader debug information identifier.
The Nsight Aftermath GPU Crash Dump Monitor uses the shader debug information identifier to generate a unique base name and the .nvdbg extension for the debug information files it creates, e.g., A9B36BBAFFD79B51-000001BB689D5060-*.nvdbg. Developers using the Nsight Aftermath SDK can freely choose the naming convention for the files being created for the NVIDIA debug information retrieved by the application via the GFSDK_Aftermath_ShaderDebugInfoCb callback. However, you are encouraged to also include the shader debug information identifier in the file name convention you use. This may help to understand why debug information may not be found with the current search path settings.
The Nsight Aftermath SDK provides the GFSDK_Aftermath_GetShaderDebugInfoIdentifier API that can be used to calculate the shader debug information identifier for a memory buffer containing shader debug information data.
This function and additional information can be found in the GFSDK_Aftermath_GpuCrashDumpDecoding.h header file or the Nsight Aftermath SDK documentation.