Nsight Graphics

The user guide for NVIDIA Nsight Graphics.

Introduction to NVIDIA Nsight Graphics

Nsight Graphics™ is a standalone application for the debugging, profiling, and analysis of graphics applications. Nsight Graphics supports applications built with DirectCompute, Direct3D (11, 12), OpenGL, Vulkan, Oculus SDK, and OpenVR.

This documentation is separated up into different sections to help you understand how to get started using the software, understand activities, and offer a reference on the user interface.

  • Getting Started - Offers a brief introduction on how to use the tools.
  • Activities - Nsight Graphics Supports multiple activities to target your workload to the need of your work at a particular point in time. This section documents each of these activities in detail.
  • User Interface Reference - Provides a deep view of all of the user interface elements and views that Nsight Graphics offers.
  • Appendix- Contains a selection of topics on concerns not covered by any other section.

Getting Started

This section describes an approach to using the Nsight Graphics tools.

Expected Workflow

When debugging or profiling, it is important to narrow your investigation to the path that provides the most impactful and actionable data for you to make conclusions and solve problems. Nsight Graphics provides a number of tools to fit each of these workflow scenarios.

When debugging a rendering problem, Nsight Graphics's Frame Debugger is the tool of choice. This tool enables the inspection of events, API state, resource values, and dependencies to understand where your application might have issues. For more information on the Frame Debugger, see Frame Debugger.

When profiling a graphical application, the first step is to determine if you are CPU or GPU bound. If you are CPU bound, you will not be able to issue enough work to the GPU to take full advantage of its full processing power. If you are GPU bound, the GPU is not able to process the work it is issued fast enough and your engine may stall. One way of making the determination of which aspect is limiting you is to use Nsight Systems™. Nsight Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you select the largest opportunities to optimize, and tune to scale efficiently across any quantity of CPUs and GPUs in your computer. NVIDIA also provides a system analysis and trace tool within Nsight Visual Studio Edition; for more information on that tool see this site.

If you have determined that you are CPU bound, you will want to use a CPU profiling tool to discover how you can eliminate inefficiencies to issue work faster to the GPU. You may also want to look into the overhead of the API constructs you are using and determine if there are more lighter weight constructs that can offer the same effect at less cost. The Frame Debugger tool is an excellent resource while you are making these adjustments to your engine.

When GPU bound, you will want to determine where and how you are limited by the GPU. Nsight Graphics 's profiling tools can offer you this information. For a quick, high-level evaluation of your GPU performance, the real-time frame performance graphs that Nsight Graphics provides can offer you a continuous look at your performance as you navigate your scene. For a detailed, per-range analysis of your GPU performance, you can use the Range Profiler to gather this information. This analysis can help you to determine whether you need to optimize your shaders, render target and texture usage, or memory configuration.

The GPU Trace activity within Nsight Graphics allows for analysis of a few different GPU bound scenarios. GPU Trace offers a deep analysis of your SM's performance by tracing the execution of your shaders on the SM across a series of frames. Another key technique in optimizing performance is to take advantage of the GPUs ability to process parallel work by using techniques to achieve simultaneous compute and graphics (SCG), also known as async compute. GPU Trace allows you to both see opportunities for async compute as well as to confirm and measure the impact of async compute on your frame.

How to Launch and Connect to Your Application

To analyze an application, Nsight Graphics requires the launching of applications through its launching facilities.

Upon starting Nsight Graphics, you are presented with the option to create a project. If you are using Nsight Graphics for the first time, skip project creation by selecting continue under Quick Launch.





Once selected, you will be presented with a target-specific dialog that allows you to configure the application to launch. Browse and select the activity you wish to run and then proceed to the target-specific instructions below to configure the application to analyze.

Process Launch and Connection on Windows Targets

Launching a Native, Single-Process Application

Launching a single-process application is simple. To launch a native, single-process application:

  1. Set the application executable to the path of your application
  2. If your application requires a working directory that is different from your application's directory, adjust it now.
  3. Adjust the environment (if necessary)
  4. Leave Automatically Connect as Yes
  5. Click Launch

Once launched, you will be presented with a dialog that notes the launching and attaching of your appllication. After the launch completes you are ready to being your analysis.

Launching a Native, Multi-Process Application

Launching a multi-process application requires that you select the child process that you wish to attach to. To launch a native, multi-process application:

  1. Set the application executable to the path of your application
  2. If your application requires a working directory that is different from your application's directory, adjust it now.
  3. Adjust the environment (if necessary)
  4. Set Automatically Connect to No
  5. Click Launch
  6. You will be presented with a dialog that notes the launching of your batch file. Once launched, go to the attach tab.
  7. Select the application you wish to analyze in the attach tab and click Attach.

After the launch completes you are ready to being your analysis.

In the example image above, VRFunhouse.exe is a child process of the UE4Game.exe launcher. Selecting VRFunhouse.exe and clicking Attach would allow you to analyze the primary application.

Launching a Batch File

Launching a batch file is supported through launching by way of Window's command prompt application. To launch a batch file:

  1. Set the application executable to C:\Windows\system32\cmd.exe
  2. Set the working directory to the one in which your batch file is expected to be launched
  3. Set the command line arguments to: /c fullpathto/yourscript.bat
    1. where fullpathto/yourscript.bat is the full path to your script.
  4. Set Automatically Connect to No
  5. Click Launch
  6. You will be presented with a dialog that notes the launching of your batch file. Once launched, go to the attach tab.
  7. Select the application you wish to analyze in the attach tab and click Attach.

Launching a Steam Game

Launching a Steam game requires that Steam be launched through Nsight Graphics before the game is launched. To launch a Steam game:

  1. Ensure that all existing Steam processes are closed
  2. Set the application executable to C:\Program Files (x86)\Steam\Steam.exe
  3. Set Automatically Connect to No
  4. You will be presented with a dialog that notes the launching of your batch file. Once launched, go to the attach tab.
  5. Select the application you wish to analyze in the attach tab and click Attach.

After the launch completes you are ready to being your analysis.

Remote Launching

Remote debugging is supported on Nsight Graphics on Windows through use of the Nsight Remote Monitor. This is a process that runs on a target machine to allow connections to be started on that machine.

To run the remote monitor, install Nsight Graphics on the target machine. Then, launch the remote monitor on that machine by Start > NVIDIA Corporation > Nsight Remote Monitor.

Once the monitor is launched on the remote machine, you need to add the remote monitor as a connection in Nsight Graphics. By default, launches will be done on the localhost machine. To add another machine, click the + button.

This brings up a dialog in which you can add a machine name or IP address.

Enter the machine name in IP/Host Name. Click Add to add the connection. The machine you just added will be listed as the target connection at this time.

Any number of connections may be added; connections can be removed by clicking - on the selected connection. The connections may be switched between any of the added connections before launch or attach. Connections are globally persisted and may be applied to any project once they are added.

Process Launch and Connection on Linux Targets

Linux

Remote debugging on Linux is supported through SSH connections. Enter your SSH information when establishing the connection to connect to the target machine.

After a Process is Connected

After a process is connected, it is ready to be analyzed. For many activities, a default set of windows will come up that offer an impactful set of tools for analysis that pertains to the activity. You can also add additional windows to the application by selecting a view from the menu bar. See the User Interface Reference for a detailed discussion of each view and tool window.

For the Frame Debugger activity that was started above, there are both live analysis and capture utilities. After experimenting with the live analysis tools, proceed into capture by selecting Capture for Live Analysis from the main toolbar. In Nsight Graphics, a number of views will open. On the target application, the HUD will appear with the toolbar and scrubber. This UI allows you to view an exhaustive amount of information on the state, resources, and synchronization of your application.

With such an expansive set of information available, debugging a rendering problem is made easier.

Configuring Your Application for Optimal Analysis

In order for your application to work well with the analysis tools provided by Nsight Graphics, there are a number of details you should consider when configuring your application.

Using Perfmarkers

Perfmarkers are integral to nearly all workflows. We recommend that your application always run with perfmarkers when running under tools analysis.

Performance markers are most commonly used to delineate sections of events and note where in your application they begin and end. They can also be nested to show sub-sections of events. Perf markers are generally used to measure the amount of time that an inner portion of algorithm will take.

There are three different types of perf markers that are supported in Nsight Graphics:

  1. D3D9 perrmarkers are supported.
  2. Direct3D11.1 and higher applications support ID3DUserDefinedAnnotation. (For more information, see ID3DUserDefinedAnnotation interface on MSDN.)
  3. OpenGL applications use the KHR_debug group, glPushDebugGroup and glPopDebugGroup .
  4. NVTX tools extension.

Shader Compilation

Nsight Graphics works best when you have the full shader source available for debugging, as this . Follow the steps below to set up your application for optimal configuration.

D3D Configuration

Nsight Graphics works best with access to the original HLSL source code of your shaders. There are a few ways to accomplish this task. The first is to submit the HLSL source at startup time and compile your shaders, using D3DXCompileShader, etc. Nsight Graphics can intercept these calls to gain access to the HLSL source code and display it for debugging.

Alternatively, you can precompile the shaders into binary format using the same functions and saving the results out to a file, or use the offline compiler, fxc.exe, provided by the DirectX SDK. However, using this method, you need to specify some flags in order for the HLSL debug information to be embedded in the binary output, outlined below:

D3DXCompileShader Compiler Flag (using fxc.exe)
D3DXSHADER_PREFER_FLOW_CONTROL /Gfp
D3DXSHADER_DEBUG /Zi
D3DXSHADER_SKIPOPTIMIZATION /Od

Naming Objects and Threads

Many of Nsight Graphics's views and analysis benefits from naming API objects and threads. Similar to perfmarkers, these names can help offer increased context for your analysis. The tables below list the supported methods for naming objects and threads.

Table 1. Naming Objects
API Method
D3D11 No programmatic method; use Nsight-generated names
D3D12 ID3D12Object::SetName
OpenGL glObjectLabel
Vulkan vkDebugMarkerSetObjectNameEXT or vkSetDebugUtilsObjectNameEXT
Table 2. Naming Threads
Platform Method
Windows SetThreadNameDescription
Linux Not yet supported

Activities

Nsight Graphics Supports multiple activities to target your workload to the need of your work at a particular point in your development process.

  • Frame Debugger - allows you debug a frame by each draw call. You can view vertex shaders, pixel shaders, and pipeline states.
  • Frame Profiler - provides a deep analysis of the performance of your application. Several features are provided to analyze.
  • Generate C++ Capture - The C++ Capture activity allows you to export an application frame as C++ code to be compiled and run as a self-contained application for later analysis, debugging, profiling, regression testing, and edit-and-compile experimentation a frame by each draw call. You can view vertex shaders, pixel shaders, and pipeline states.
  • GPU Trace - supports the analysis of SM workloads.

Frame Debugger

The Frame Debugger activity allows for:

  • Real-time examination of rendering calls;
  • Interactive examination of GPU pipeline state, including visualization of bound textures, geometry and unordered access views;
  • Pixel History shows all operations that affect a given pixel;
  • Range Profiler identifies performance bottlenecks and GPU utilization;
  • C++ Capture exports for offline collaboration and analysis.

When to use the Frame Debugger Activity

The Frame Debugger activity offers a comprehensive set of tools for discovering problems with your application's rendering or general operation. This activity enables the inspection of events, API state, resource values, and dependencies to understand where your application might have issues. Use this activity when:

  • You have a render-accuracy issue
  • You expect that you may have a synchronization issue

The Frame Debugger activity supports all APIs that are generally supported by Nsight Graphics.

Basic workflow

To start this activity, select Frame Debugger from the connection dialog.

The basic workflow for the Frame Debugger activity is to capture an application and then navigate the events, data, and resources that your application is submitting/using to identify your issue.

Whether you are debugging on the CPU or GPU, the first step of any debugging process is to narrow in on the set of data that you need to analyze to understand your problem. Generally, this means that you will want to scrub to a particular event of interest in either the Scrubber or the Event Viewer. Because Nsight Graphics™ will show you the rendering contribution of every draw call, looking at either the HUD or the Current Target View will give you an indication of where your rendering might be going wrong. Another alternative is to use the Pixel History experiment to automatically identify the draw calls that relate to a particular texture update.

From there, you will want to use your knowledge of the graphics pipeline to try to understand what might be causing a problem. Some questions to ask yourself:

  • Is this a geometry problem? If so, is it a pre-transform or post-transform problem?
  • Is this a blending problem?
  • Is this a synchronization problem?

In some cases, there may be a combination of problems that exacerbate a given problem. Isolating the symptoms can be challenging, but an effective use of the tools can offer increased confidence that you are heading in the right direction.

Frame Profiler

The Frame Profiler activity provides a powerful set of tools to assess the performance of your application from a multiplicity of angles. The Frame Profiler Activity allows for:

  • Optimizing the rendering of your application
  • Seeing detailed GPU utilization
  • Automatic determination of performance limiters

When to use the Frame Profiling activity

The profiling activity provides detailed performance information for all units of the GPU. Use this activity when:

  • You know that your application is GPU bound
  • You want to determine whether you are SM bound
  • You want to explore the performance of a functional unit of the GPU

The profiling activity currently supports profiling D3D11, D3D12, and OpenGL applications.

Basic workflow

To start this activity, select Frame Profiler from the connection dialog.

There are two modes of operation that are supported by this activity:

  • Real-time performance analysis
  • Detailed Frame Analysis

Real Time Analysis

Real-time analysis is supported by configuring signals to discover the performance of different units of the GPU hardware. A detailed description of the signals available are listed in the Performance Dashboard. Use the performance dashboard to identify areas that require further study; this is often a useful precursor to a more detailed frame analysis.

Because the analysis is real-time, you can navigate through different scenes or models in your application to determine areas that are of most impact.

Detailed Frame Analysis

Profiling is supported by a similar capture workflow as discussed in the Frame Debugger activity. Once you have captured your application, several views come up by default that are targeted at giving your the information you need to understand your applications performance on the GPU. Several views interact to make this possible, including performance timings in the Event Viewer, a graphical illustration of the timings in the Scrubber, and detailed breakdown of each of your application's workflows Range Profiler.

Once you have opened the Range Profiler, you will notice that the calls and ranges are scaled by GPU time, similar to the Scrubber.

Like the Scrubber, you can use the Range Selector to find interesting sections in the scene to profile, as well as drill into a currently selected range.

The default view will show ranges based on the performance markers you have defined in your application. These can be done via the NVTX library, KHR_debug, or any other range definition API supported by your graphics API of choice. Clicking the Add... button will open a dialog that allows you to select what type of range you want to add.

  • Program ranges — Actions that use the same shader program.
  • Viewport ranges — Actions that render to the same viewport rectangle.
  • User ranges — A range defined by you on the fly. Use SHIFT + left-click and drag the scrubber on the created "User" row to create a new range. This can be helpful to drill into a section of the scene, or to compare different frame sections that don’t already have ranges defined for it.

When you click on a range on the Scrubber portion, the other sections of the Range Profiler View will automatically update with that selected range's information. You can also click on a single action in the Scrubber to profile only that action.

What to Profile?

The first step for improving performance in a GPU bound application is to determine where you are spending GPU time in the rendering of the scene. This can be accomplished a number of ways using the Frame Debugger. First, adjust the scaling of the Scrubber to be based on GPU time.

This will allow you to see at a glance where the time is being spent on the frame. These ranges will show up in the Scrubber, also scaled by the amount of time the work executed within them takes. Finally, if you haven’t added debug ranges, you can use various criteria to create them on the fly in your debugging session, including render target sets, shader programs in use, etc.

Look for ranges that seem larger than expected, given what you are trying to accomplish in that section of the frame. Also, larger ranges/draw calls likely have more headroom for improvement, so they can be good places to start deeper investigation.

You can also see how much GPU time is spent on various actions and ranges in the Event Viewer. By sorting by GPU Time, you can quickly find the most expensive parts of the frame and begin your analysis from there.

Once you find an area you are interested in profiling, use the right mouse button context menu to initiate the Range Profiler. This will open up the profiler focused on the range or call you determined to be interesting. (Alternatively, you can open the Range Profiler through Frame Debugger > Range Profiler.)

Range Profiler Cookbook

Is my program CPU or GPU bound?

Try hotkey experiments such as minimum geometry and null scissor, to determine if you are GPU bound.

What are the most expensive draw calls in my application?

Capture a frame, and then run the Range Profiler. Once the Range Profiler is done running experiments, the entire scene will be selected by default. This will allow you to see details about all of the draw calls and dispatches in the scene. If you select Action Details in the Range Info section, you will see details on each draw call, including the execution time. Sort the table to time to see the most expensive draw call.

How can I optimize a range of draw calls?

In the Pipeline section, select Range Details and you will see an image with a virtual GPU pipeline. The red bars indicate units in the GPU that are not being used as efficiently as they could, so look for the higher bars to indicate where you need to spend time optimizing. (See below for specific tips on optimizing your API inputs for a particular unit).

How do I see collections of draw calls which share common state (like pixel shaders and vertex shaders)?

The Range Profiler contains a powerful grouping capability that allows you make new ranges based on common state. These include ranges based on program/shaders being used, viewport, render targets, and even user ranges that can be declared on the fly.

How do I profile draw calls which are in a specific performance marker?

The scrubber at the top of the Range Profiler View shows all of the performance marker ranges defined by the application, along with the amount of time spent for each one. A good strategy would be to look for ranges with a large amount of time, then drill down to where you see a large amount of time being spent. Once you click on that range, you can look at the Pipeline section for details on how that selected range is utilizing the GPU.

Why does my application run at a different frame rate under Nsight Graphics?

The Nsight Graphics Frame Debugger disables VSYNC, so applications that have VSYNC enabled under normal circumstances may see a higher frame rate when the same application is run under the Frame Debugger. Nsight Graphics also has a small performance overhead, which may reduce the frame rate slightly.

Generate C++ Capture

The C++ Capture activity allows you to export an application frame as C++ code to be compiled and run as a self-contained application for later analysis, debugging, profiling, regression testing, and edit-and-compile experimentation.

When To use the Generate C++ Capture activity

While C++ captures can be collected in while Frame Debugging, the C++ capture activity provides a focused activity to streamline the creation of captures. Non-necessary analysis subsystems are turned off in order to allow for the quickest and more robust application capture. This activity is an excellent way to save a snapshot of your application, frozen in time. Use this activity when:

  • You want to save a deterministic application for follow-up performance analysis.
  • You want to save a reference point for how your application is working.
  • You want to share a minimal reproducible with the devtools or driver teams at NVIDIA to facilitate bug reporting.

The Generate C++ Capture activity supports all APIs that are generally supported by Nsight Graphics.

Basic workflow

To start this activity, select Generate C++ Capture from the connection dialog.

Once the application is running, the Generate C++ Capture button will be available on the main toolbar.

Once a capture is started, the target application will temporarily pause, and a progress dialog will be shown detailing the steps of the export to C++ process. When complete, the C++ project is written to the disk and the application will resume.

By default, the save directory is co-located beside the current project. If no project is currently loaded the default save directory is used (see Options > Environment > Default Documents Folder).

In addition to the C++ project, the code generation process also produces an nsight-gfxcppcap file with additional information and utilities. These nsight-gfxcppcap files are automatically associated with the current project and can be reopened later.

The additional features of an nsight-gfxcppcap file include:

  1. Screenshot of the capture taken from the original application
  2. Information about the captured application and its original system
  3. Statistics about the captured API stream
  4. Utilities to build the C++ capture without opening the generated Visual Studio project
  5. Utilities to launch the compiled application:
    1. The Execute button will launch the compiled executable.
    2. The Connect... button will populate a new connection dialog that allows you to run a specific activity on the generated capture.
  6. User comments that are persisted within this file.

Using a Saved Capture

  1. To use the saved capture, open the saved project from the Captures directory in Visual Studio. Saving a capture will generate the source code, as well as project and solution files for all supported Visual Studio versions.
    Versioned Visual Studio projects may be opened with Visual Studio versions that do not match the project version, but note that you may be asked to upgrade your project when opening it for the first time.
  2. These solution files contain a number of generated source files.
    1. Main.cpp — This is where all of the initialization code is called, resources are created, and each frame portion is called in a message loop.
    2. ResourcesNN.cpp — Depending on the number of resources to be created, there will be multiple ResourcesNN.cpp files, each with a CreateResourcesNN call in them, that will construct all of the resources (device(s), textures, shaders, etc.) that are used in the scene. These are called in Main.cpp before replaying the frame in the message loop.
    3. FrameSetup.cpp — This file contains all of the state setting calls to set the API state to the proper values for the beginning of the frame, including what buffers are bound, which shaders are enabled, etc.
    4. FrameNPartMM.cpp — In Direct3D and single-threaded OpenGL captures, these files contain the API functions, each named RunFrameNPartMM(), to replay the frame. It is split into multiple files so generated code is easier to work with. These functions are called sequentially in the message loop in Main.cpp.
      In this scenario, both N and MM are placeholders for numbers in the multiple files generated. FrameN will typically be Frame0 since only a single frame is captured, and PartMM will typically be in the 00-05 range, depending on how many API calls are in the frame.
    5. ThreadLLFrameNPartMM.cpp — In multi-threaded OpenGL captures, these files contain the API functions, each named ThreadLLRunFrameNPartMM(), to replay the frame. The functions correspond to the work done by each thread during the frame. These functions are called by their respective threads and synchronized to replay the saved events in the same order as captured.
    6. ReadOnlyDatabase.cpp — This is a helper class to access resource data that is stored in the data.bin file. It is accessed throughout the code via the GetResource() call.
    7. Helpers.cpp — These functions are used throughout the replayer for various conversions and access to the ReadOnlyDatabase.
    8. Threading.cpp — This file contains helper functions and classes to manage threads used in the project.
  3. Build and run the project.

Changing a Resource

If you want to change a resource (for example, to swap in a different texture), you can change the parameters for the construction by looking within the ResourcesNN.cpp files for the texture in question. Textures can be matched by size and/or format. Once you find the variable for the texture, look for that name in the FrameSetup.cpp file. This will contain source lines to lock the texture, call GetResource() to retrieve the data from the ReadOnlyDatabase, and then call memcpy(…) to link the data to the texture. You can substitute the call to the ReadOnlyDatabase with a call to read from a file of choice to load the alternate texture.

Changing a Draw Call

If you want to change the state for a given draw call, you can locate the draw call by replaying the capture within Nsight Graphics and scrubbing to find the call you want to examine. Search in the FrameNPartMM.cpp files for Draw NN, where NN is the 0-based draw call index that Nsight Graphics displayed on the scrubber. Doing this will bring you to the source line for that draw call, and from here, you can add any state changes before that call. Alternatively, you can also disable that specific call by commenting out the source call containing the draw call.

Parameters

  • -repeat N — This setting enables Nsight Graphics to use serialized captures in the normal arch workflow. The N setting indicates the number of times to repeat the entire capture; the default setting is -1, which keeps the capture running on an infinite loop.
  • -noreset — This setting controls whether context state and all resources are reset to their beginning of frame value. When this setting is specified, all frame restoration operations will be skipped, avoiding the performance cost associated with them. Note that this may introduce rendering errors if the rendered frame has a data dependency on the results of a previous frame.

GPU Trace

The GPU Trace activity provides detailed performance information for SM units of the GPU as well as identifying asynchronous compute usage and opportunities. The GPU Trace activity will create a detailed report with a detailed timeline view and data tables with which you can do your analysis.

When to use the GPU Trace activity

The GPU Trace activity provides detailed performance information for SM units of the GPU as well as identifying asynchronous compute usage and opportunities. Use this activity when:

  • You have confirmed that you are GPU bound
  • You suspect that your engine will benefit from asynchronous compute
  • The Frame Profiler activity has indicated that you are SM bound

The GPU Trace activity currently supports profiling D3D12 applications.

Basic workflow

To start this activity, select GPU Trace from the connection dialog.

  1. Set up your application for connection (see How to Launch and Connect to Your Application for more information.).

  2. Set a Frame Count. This parameter defines how many frames will be captured. The maximum value is 5.
    Note: GPU Trace consumes a lot of memory, especially in complex frames. You need to make sure that by capturing large number of frames, there is enough memory to consume it all.
  3. Launch or attach to your application. (See How to Launch and Connect to Your Application for more information.) 
  4. If the application successfully connected, the process name will appear in the lower right corner on the window.

    Note: Currently, it is only possible to run GPU Trace when the host and target are running on the same machine.
  5. Once launched and connected, click Generate GPU Trace Capture (or select it from the GPU Trace menu) to create a capture report file.

    Note: It is recommended that you close the application after capturing, in order to free up your system's memory while exploring the captured file.

How to Interpret a Report

When interpreting a report, reference the GPU Trace UI section for information on how to interpret each of the pieces of information that is provided. Things to consider:

  • Am I using asynchronous compute?
  • Do I have opportunities for asynchronous compute?
  • What workloads are taking the most time?
  • Is my occupancy low for these workloads?

If you determine that you have opportunities for asynchronous compute and you are not currently using (or achieving) async compute, you may want to investigate your engine to understand where or how you can achieve it.

If you determine that you have expensive workloads with low occupancy, you will want to analyze your shader for opportunities to reduce work or reduce register/memory usage to allow for more occupancy.

User Interface Reference

This section provides a deep view of all of the user interface elements and views that Nsight Graphics offers.

App Configuration and Activity selection UI

Launch Tab

The Launch tab enables launching applications for analysis. This is where you will add the basic process information to launch and subsequently connect to the application you wish to analyze.

This tab has the following controls:

Application Executable - Specifies the root application to launch. Note that this may not be the final application that you wish to analyze; see this section on how to launch different types of applications.

Working Directory - The directory in which the application will be launched

Command Line Arguments - specify the arguments to pass the application executable.

Environment - the environment variables to set in the launched application

Automatically Connect - specifies whether the launched application should be automatically connected to. If the launched application is a launcher that creates the process that you ultimately wish to analyze, set this to 'No'.

Note: Several fields have a selector to allow you to cycle through recently used entries. This is a useful capability for cycling through common configurations.

Attach Tab

To attach to an application, it must have previously been launched through the launch tab. This page will list the launched application as well as any children that the application has launched.

Note: If the host disconnects for any reason, and the target happens to still be running, you can reattach to the previously launched or even captured application by using the attach facility. The process does not have to be newly relaunched.

Activities Options

Nsight Graphics allows for adjusting the activity with a large set of options.

Table 3. General Options
Option Description
Enable Target HUD Enables the HUD on the target application, which enables:
  • Capturing via Ctrl+Z
  • Real-time Hardware and Software Signals
  • Draw binning
Force Repaint Enables a periodic trigger of window invalidation, which causes applications that lazily present to repaint, such as many professional visualization applications. This is useful for providing a consistent stream of frames with which Nsight Graphics can perform its analysis.
Table 4. OpenGL Options
Option Description
Frame Delimiter Select the API call used to delimit frame boundaries for OpenGL applications. This setting is useful for applications that do not necessarily present to a screen, such as offscreen rendering applications or benchmark applications.
Table 5. D3D Options
Option Description
Synchronous Shader Collection Controls the extent of information that is collected for D3D11 shaders. Synchronous collection is necessary for some shader related statistics but may introduce increased application loading time. Synchronous collection also requires that application has been started with administrative privileges
D3D12 Replay Fence Behavior Choose the behavior when encountering a sync point during D3D12 replay.

Modern APIs, such as D3D12, have fine-grained, application control of synchronization. Tools must infer what the expectations of the application when identifying application syncs, and must do it in a way that allows for high performance while still respecting data hazards. There are several possible synchronization points, such as when the application calls GetCompletedValue, when an application calls a member of the WaitFor*Object family, when a Signal is observed to have been emitted, etc. This setting controls the approach that is used by Nsight Graphics in reflecting the application synchronization behavior.

  • Default - synchronizes on GetCompletedValue and Wait events
  • Never Sync - never performs synchronization. This option instructs replay to be free running, potentially leading to the highest frame rate. Note that this is extremely likely to run into data hazards, so use with caution.
  • Always sync - performance synchronization at every possible synchronization opportunity (see above list of synchronization points). This will lead to the lowest frame rate, but introduces the most safety in replay. Use this setting as a debugging option if you suspect that there are synchronization options in the application replay. If turning this option on does lead to render-accuracy, please contact support to report this as a bug.
  • No sync on GetCompletedValue - applies all default settings, but turns off synchronization on GetCompletedValue. GetCompletedValue can be used as both a determination of what the current fence value is as well as an input into a control flow decision. Accordingly, because it may lead to control flow, it is synchronized on by default. You may use this setting if you are certain your application never uses GetCompletedValue as a control flow decision.
  • No Sync On Wait Corresponding To SetEventOnCompletion - This options turns off Synchronization on Win32 Wait calls. Note that this is extremely likely to run into data hazards, so use with caution.
DXGI SyncInterval Controls the SyncInterval value passed to the DXGI Present method. The default is to disable V-Sync to allow the debugger to collect valid real-time counters.
Report Force-Failed Query Interfaces Controls whether failed query interfaces are reported to a user with a blocking message box.

Nsight Graphics is an API debugger, and there may be some APIs that it does not yet support or does not yet know about. When such an interface is queried, the interception will force the failure of the operation with an E_NOINTERFACE return code. While this is valid by the COM spec, there are many applications that do not check the results of their QueryInterface calls, and as such, the application may assume success and will end up crashing as it dereferences a null pointer. To combat this issue, Nsight Graphics will, by default, issue a blocking message box to inform the developer of the issue. This message box will offer the opportunity to understand issues that manifest at a later time or offer the indication that the application may need adjustment before a crash.

If this operation interferes with normal operation, and otherwise would result in no issues, it may be disabled for the project.

Report Unknown Objects Controls whether unknown objects are reported to a user with a blocking message box.

Some applications pass objects that are unknown to Nsight Graphics. These objects may be indicative of an application bug, lack of support in the product's interception, or they may ultimately be innocuous. In many cases, such an unknown object may result in an analysis crash. To mitigate this issue, Nsight Graphics warns about this concern with a blocking message box.

If this operation interferes with normal operation, and otherwise would result in no issues, it may be disabled for the project.

Table 6. Vulkan Options
Option Description
Force Validation Force the Vulkan validation layers to be enabled. This requires the LunarG Vulkan SDK to be installed.
Validation Layers Layers used when force enabling validation. This option is only visible when 'Force Validation' is turned on.
Enable Coherent Buffer Collection Controls the monitoring and collection of mapped coherent buffer updates during capture. This is potentially an expensive operation and many applications can replay a single frame without actively monitoring these changes. Use this option if your capture takes a long time but you do not straddle frames with coherent updates.
Allow Unsafe pNext Values Allows the inspection of Vulkan structures with potentially dangerous pNext values. By default structures with no known extensions are skipped.
Use Safe Object Lookup

Controls how objects are stored internally by the tool.

Safe lookup are slower but may improve stability when using an unsupported extension.

  • Auto - Fallback to safe mode when an unsupported extension is seen.
  • Enable - Always use safe object lookup.
  • Disable - Never use safe object lookup.
C++ Capture Object Set

This option controls which objects are exported as part of a Vulkan C++ capture.

By default we limit the object set to only objects used in the capture but in some cases a user might want to see all objects used in the application. This typically isn't necessary and can lead to a very large C++ project.

This might also help WAR a bug where the tool incorrectly prunes an object it shouldn't have.

  • Only Active - Only include objects actively used in capture
  • All Resources - All active capture objects plus all buffers, images, pipelines, and shaders
  • Full - The entire object set
Reserve Heap Space Amount of physical device heap space (MB) to automatically reserve for the frame debugger.
Unweave Threads For multi-threaded applications, attempts to remove excessive context switching by grouping thread events together. May improve C++ capture replay performance of heavily threaded applications.
Table 7. Ray Tracing Options
Option Description
Copy Acceleration Structure Geometry After building an acceleration structure, it is legal to update or destroy the geometry buffers used in construction. Without deep copies of the original data, the tool cannot guarantee full function of the acceleration structure viewer, or of C++ capture. For the sake of performance some activities will skip deep collection but will issue warnings if one of these operations is attempted. If no deep data is available the original input buffers will be used in their current state.
Ignore Shallow Copy Warnings If an expert user knows that the original acceleration structure input data remains undisturbed they may silence warnings with this setting.
Collect Geometry In GPU Memory By default acceleration structure deep copy data is collected in system memory, for stability reasons. Performance may be somewhat better doing the collection into GPU memory, but this puts pressure on the application's video memory budget.
Table 8. Troubleshooting Options
Option Description
Enable Driver Instrumentation Controls the enablement of capabilities that require driver support. This effectively disables:
  • Hardware performance metrics
  • Native shaders collection
  • Other underlying mechanisms for timing

Disabling this option is the first and best option to try if you run into capture errors as it disambiguates problems quickly given the number of subsystems it turns off.

Collect Shader Source Controls the collection of shader source code associated with shader objects. This option is useful if you suspect an error or incompatibility with any of the shader processing libraries you use (such as D3DCompiler.dll).
Collect Shader Disassembly Controls the collection of shader disassembly associated with shader objects. This option is useful if you suspect an error or incompatibility with any of the shader processing libraries you use (such as D3DCompiler.dll).
Collect Shader Reflection Controls the collection of shader reflection associated with shader objects. This option is useful if you suspect an error or incompatibility with any of the shader processing libraries you use (such as D3DCompiler.dll).
Collect Native Shaders Enable fetch of hardware native shaders which can be used to collect shader performance stats.
Collect Hardware Performance Metrics Enables the collection of performance metrics from the hardware.
Ignore Incompatibilities

Nsight Graphics uses an incompatibility system to detect and report problems that are likely to interfere with the analysis of your application. By default, these incompatibilities are reported and the user is given the option of capturing despite them (with an associated warning of the possibility of issues). Some applications may have innocuous incompatibilities, however, and having to view this warning every time might be undesired.

When this option is enabled, the frame will attempt to capture despite any incompatibilities. Use this option only when you are certain that the incompatibility will not impact your analysis.
Block on First Incompatibility

Nsight Graphics uses an incompatibility system to detect and report problems that are likely to interfere with the analysis of your application. In some cases, these incompatibilities may be the first sign of an impending failure. Accordingly, being able to block on such a reported failure may aid in triaging and understanding a crash when running under Nsight Graphics . This option is disabled by default so as not to interfere with expected operation, but it may be useful to toggle if you encounter an application crash under Nsight Graphics .

Enable Crash Reporting Enables the collection and reporting of crash data to help identify issues with the frame debugger.

While a user is always prompted before a crash report is sent, this option is available to suppress these facilities entirely.

Enable C/C++ Serialization Enables the ability to serialize a capture to C/C++.

By default, applications are available to create a C++ capture, but there are some cases where extra data is collected in support of this feature before it is invoked. This option allows that collection to be disabled entirely.

Force Single-Threaded Capture Controls whether capture proceeds with concurrent threads or with serialized threads.

Use this option if you suspect your application's multi-threading may be interfering with the capture process.

Replay Thread Pause Strategy Controls the strategy used in live analysis for pausing threads.
  • Auto - Use the default strategy, which may use an Aggressive strategy for some applications.
  • Aggressive - Pause all non-Nsight threads.
  • RenderOnly - Only pause rendering threads.

Frame Debugging/Profiling UI

The Frame Debugger and Frame Profiler activities are capture-based activities. There are two classes of views in these activities – pre-capture views and post-capture views. Pre-capture views generally report real-time information on the application as it is running. Post-capture views show information related to the captured frame and are only available after an application has been captured for live analysis. For an example of how to capture, follow the example walkthrough in How to Launch and Connect to Your Application.

Real-time Analysis UI

Nsight Graphics offers several tools for real-time performance analysis and debugging.

Performance Dashboard

The Performance Dashboard provides a view of application performance, in relation to CPU and GPU activity. The Performance Tests View should be the first port of call when examining any application's performance. These tests facilitate a quick, high-level identification of performance issues, while the application is still running.

The following Performance Tests are available to use with Nsight Graphics.

General Performance

Minimal Geometry

This test reduces the amount of geometry submitted to one triangle per draw call. If the frame rate does not go up significantly, the application is likely CPU-bound. If the frame rate does increase, the application is GPU-bound.

Shaders and Texturing

2x2 Textures

All 2D and cubemap textures are instantly replaced with small 2x2 textures. If the frame rate goes up, then the texture unit is a likely bottleneck for the scene.

Significant response to this test usually indicates missing mipmaps, heavyweight texture formats, and/or expensive filter modes.

Fragment Bandwidth

Draw Wireframe

Displays a wireframe rendering of the application running on the target.

Signal Graph View

The Signal Graphs section of the Performance Dashboard displays tool windows that allow you to view various counters given by Nsight Graphics.

The default signal graphs are:

  1. FPS — Displays the frames per second in the application that is being debugged or profiled.
  2. Units Busy — Shows how busy different units in the GPU are, such as the Geometry, Shader, and Texture units.

To modify a Signal Graph, click the wrench icon, and the Signal Graph Configuration dialog will open.

You can modify how the signals are displayed by changing their color, deselecting those you don't want to see, or click Add Signal... to add new signals to the graph.

HUD

The HUD is a heads-up display which overlays directly on your application. You can use the HUD to view real-time GPU signals and performance counters, capture a frame, and scrub through its constituent draw calls.

All actions that occur either in the HUD or on the host — such as capturing a frame or scrubbing to a specific draw call — are automatically synchronized between the HUD and the host, and thus you can switch between using the HUD and host UI seamlessly as needed.

The HUD has three (3) modes:

Running: Interact with your game or application normally, while the HUD shows real-time GPU performance graphs overlaid on the scene. When you first start your application with Nsight Graphics, the HUD is in Running mode. This mode is most useful for viewing GPU performance information in real-time while you play your game.

Activated: Once activated (using the activation hot-key toggle), the Nsight Graphics HUD allows the resize or repositioning the signal graphs, and the pause and capture of a frame from the running application.

Frame Debugger: Once you have captured a frame, you can debug the frame directly in the Nsight Graphics HUD (as well as from the host). The HUD allows you to scrub through the constituent draw calls of a frame, to view render targets with panning and zooming, and to examine specific values in those render targets.

Running Mode

In this mode, your application can interact with the game or application normally, and the HUD shows real-time GPU performance graphs overlaid on the scene. When you first start your application with Nsight Graphics, the HUD is in Running mode.

To activate the HUD:

Make sure your graphics application has focus, and then enter the activation hot-key, CTRL+Z. The HUD is now in Activated mode.

Activated Mode

Once activated (using the activation hot-key toggle), the HUD allows you to toggle hotkey experiments. A toolbar containing common operations becomes visible in this mode.

Frame Debugger Mode

Once you have captured a frame, you can debug the frame directly in the HUD. While you can also debug the frame on the host, the HUD allows you to scrub through the constituent draw calls of a frame, to view render targets with panning and zooming, and to examine specific values in those render targets.

Hot Keys Action
CTRL + Plus (+) Zooms in
CTRL + Minus (-) Zooms out
CTRL + Zero (0) Makes the current texture go to a 1:1 ratio so that 1 texel fills 1 pixel.
Left-click + drag on the scrubber at the bottom Views a particular draw call in your frame. You can hold SHIFT when scrubbing for more scrub precision, which is especially useful when looking at frames with a large number of draw calls. When the desired draw call is active, release the left mouse button. The geometry for the currently active draw call will be highlighted, as long as it is on screen.
Left-click + drag on a render target Pans and zooms the currently displayed render target. Use the mouse wheel to zoom in to a particular portion of the render target.
CTRL + mouse over a render target Shows the value for the currently displayed render target. A small display window will show you a high-zoom view of the pixels in the area, and the value of the current pixel that the mouse is hovering over.

To switch the display to another active render target:

  • Click the Select Render Target button on the HUD toolbar.
  • A drop-down menu will appear, showing all valid choices for the current draw call. Select the desired render target.
  • Note that if a selected render target is not still active for a different draw call, the display will automatically switch to an active render target.

When you start debugging your graphics application with Nsight Graphics, the target computer will begin running the application. You will notice several performance graphs and a HUD toolbar overlaid on top of your application. At this point, your application is considered to be in run mode.

The HUD can be used to run the following experiments while in run mode:

  • Depth complexity view: shows the amount of overdraw in the scene.
  • 2x2 textures: shows if the application is texture-bound.
  • Null scissor rectangle: helps to determine if the application is pixel-bound.
  • Minimum geometry: draws everything in the scene normally, but only draws the first triangle of each draw call. This test shows if the application is GPU or CPU bound. If the frame rate does not go up when going into minimum geometry, it's likely CPU-bound. If the frame rate does increase, the application is GPU-bound.

There are two different methods to pause the application, which causes it to enter Frame Debugger mode.

  • Press CTRL+Z and the spacebar on the target machine; or
  • Go to the main toolbar and select Pause and Capture Frame.

After you enter Replay Mode, you will see several features overlaid on top of the application, such as timelines of draw call events and performance markers. Perhaps the most notable of these features is the HUD Toolbar, which allows you to work with your application on the target computer itself.

HUD ICON DEFINITION
Hides the GUI so you can view more of your application.
Switches between a hardware and software cursor.
Displays the Help menu, showing all available commands.
Selects a view of event ranges.
Exits the frame debugger and resumes your application.
Saves a frame capture to a file. By default, files are saved to: Documents\Nsight Graphics\Captures
Changes the current object wireframe rendering method.
Changes the current render target display from the color buffer to depth or stencil.
Toggle the normalization view of the texture display.

API Inspector

The API inspector is a common view to all supported APIs that offers an exhaustive look at all of the state that is relevant to a particular event to which the capture analysis is scrubbed.

To access this view, go to Frame Debugger > API Inspector.

While the view is common, the state within it is particular to each API. See the section below that relates to your API of interest.

D3D11 API Inspector

The API Inspector view has an API-specific pipeline navigator that allows you to select a particular group of state within the GPU pipeline. From here, you can inspect the API state for each stage, including what textures and render targets are bound, or which shaders are in use in the related constants. Note that if a stage is not active (either there is nothing bound to that stage or it doesn’t apply for the current action) it will be greyed out, but you you can still click on it to inspect the state.

Pipeline Stages

The following table shows the stages that are available for inspection:

  • IA —The Input Assembler shows the layout of your vertex buffers and index buffers.
  • VS — Shows all of the shader resource views and constant buffers bound to the Vertex Shader stage, as well as links to the HLSL source code and other shader information.
  • HS — This shows all of the shader resource views and constant buffers bound to the Hull Shader stage, as well as links to the HLSL source code and other shader information.
  • DS — This shows all of the shader resource views and constant buffer bound to the Domain Shader stage, as well as links to the HLSL source code and other shader information.
  • GS — Shows all of the shader resource views and constant buffers bound to the Geometry Shader stage, as well as links to the HLSL source code and other shader information.
  • SO — Shows the resources bound for Stream Output.
  • RS — Shows the Rasterizer State parameters, including culling mode, scissor and viewport rectangles, etc.
  • PS — Shows all of the shader resource views, constant buffers, and render target views bound to the Pixel Shader stage, as well as links to the HLSL source code and other shader information.
  • OM — Shows the Output Merger parameters, including blending setup, depth, stencil, render target views, etc.
  • CS — This shows all of the shader resource and unordered access views and constant buffers bound to the Compute Shader stage, as well as links to the HLSL source code and other shader information.

Input Assembler (IA)

The Input Assembler page shows the details of your vertex buffers and index buffers, the input layout of the vertices.

Shaders (VS, HS, DS, GS, PS, CS)

The various shader pages display all of the constant buffers, shader resource views, and input/output parameters, as well as links to the HLSL source code and other shader information.

In the constant buffer list, you can expand the buffer to see which HLSL variables are mapped to each entry, as well as the current values.

To enable resolution of HLSL variables, you must enable debug info when compiling the shader. See Shader Compilation for a discussion of the parameters required to prepare your shaders for optimal usage within Nsight Graphics.

Rasterizer State (RS)

The Rasterizer State page displays parameters including culling mode, scissor and viewport rectangles, etc.

Output Merger (OM)

The Output Merger page shows parameters including blending setup, depth, stencil, currently bound render target views, etc.

D3D12 API Inspector

The API Inspector view has an API-specific pipeline navigator that allows you to select a particular group of state within the GPU pipeline. From here, you can inspect the API state for each stage, including what textures and render targets are bound, or which shaders are in use in the related constants. Note that if a stage is not active (either there is nothing bound to that stage or it doesn’t apply for the current action) it will be greyed out, but you can still click on it to inspect the state.

Pipeline Stages

The following table shows the stages that are available for inspection:

  • IA — The Input Assembler shows the layout of your vertex buffers and index buffers.
  • VS — Shows all of the shader resource views and constant buffers bound to the Vertex Shader stage, as well as links to the HLSL source code and other shader information.
  • HS — This shows all of the shader resource views and constant buffers bound to the Hull Shader stage, as well as links to the HLSL source code and other shader information.
  • DS — This shows all of the shader resource views and constant buffer bound to the Domain Shader stage, as well as links to the HLSL source code and other shader information.
  • GS — Shows all of the shader resource views and constant buffers bound to the Geometry Shader stage, as well as links to the HLSL source code and other shader information.
  • SO — Shows the resources bound for Stream Output.
  • RS — Shows the Rasterizer State parameters, including culling mode, scissor and viewport rectangles, etc.
  • PS — Shows all of the shader resource views, constant buffers, and render target views bound to the Pixel Shader stage, as well as links to the HLSL source code and other shader information.
  • OM — Shows the Output Merger parameters, including blending setup, depth, stencil, render target views, etc.
  • CS — This shows all of the shader resource and unordered access views and constant buffers bound to the Compute Shader stage, as well as links to the HLSL source code and other shader information.

Input Assembler (IA)

The Input Assembler page shows the layout of your vertex buffers and index buffers, as well as the vertex declaration information.

Shaders (VS, HS, DS, GS, PS, CS)

The various shader pages display all of the constant buffers, shader resource views, and input/output parameters, as well as links to the HLSL source code and other shader information.

In the constant buffer list, you can expand the buffer to see which HLSL variables are mapped to each entry, as well as the current values.

To enable resolution of HLSL variables, you must enable debug info when compiling the shader. See Shader Compilation for a discussion of the parameters required to prepare your shaders for optimal usage within Nsight Graphics.

Rasterizer State (RS)

The Rasterizer page displays render state settings, texture wrapping modes, and viewport information.

Output Merger (OM)

The Output Merger page displays parameters such as blending setup, depth, and stencil states.

Device

The Device page displays details about the architecture that was used.

Present

The Present page displays information about back buffers that were used.

OpenGL API Inspector

When using the Frame Debugger feature of Nsight Graphics, you may wish to do a deep dive into the specific draw calls in order to analyze your application further. There are three different categories of API Inspector navigation.

Pipeline Stages

The first category is laid out like a "virtual GPU pipeline." This pipeline section of the API Inspector consists of the following:

  • Vtx Spec (Vertex Specification) — State information associated with your vertex attributes, vertex array object state, element array buffer, and draw indirect buffer.
  • VS (Vertex Shader) — Vertex shader state, including attributes, samplers, uniforms, etc.
  • TCS (Tessellation Control Shader) — Tessellation control shader state, including attributes, samplers, uniforms, control state, etc.
  • TES (Tessellation Evaluation Shader) — Tessellation evaluation shader state, including attributes, samplers, uniforms, evaluation state, etc.
  • GS (Geometry Shader) — Geometry shader state, including attributes, samplers, uniforms, geometry state, etc.
  • XFB (Transform Feedback) — Transform feedback state, including object state and bound buffers.
  • Raster (Rasterizer) — Rasterizer state, including point, line, and polygon state, culling state, multisampling state, etc.
  • FS (Fragment Shader) — Fragment shader state, including attributes, samplers, uniforms, etc.
  • Pix Ops (Pixel Operations) — State information for pixel operations, including blend settings, depth and stencil state, etc.
  • FB (Framebuffer) — State of the currently drawn framebuffer, including the default framebuffer, read buffer, draw buffer, etc.

Object and Pixel State Inspectors

The object and pixel state inspectors section of the API Inspector consists of the following:

  • Textures — Details about all of the currently bound textures and samplers, including texture and sampler parameters.
  • Images — Details about all of the images currently bound to the image units.
  • Buffers — Details about all of the bound buffer objects, including size, usage, etc.
  • Program — Information about the currently bound program object and/or pipeline program pipeline object, including shaders, active uniforms, etc.
  • Pixels — Current settings for pixel pack and unpack state.

Miscellaneous

The miscellaneous screen contains additional information such as shader limits, implementation dependent values, transform feedback limits, and various minimum/maximum values.

Vulkan API Inspector

The API Inspector view has an API-specific pipeline navigator that allows you to select a particular group of state within the GPU pipeline. From here, you can inspect the API state for each stage, including what textures and render targets are bound, or which shaders are in use in the related constants. Note that if a stage is not active (either there is nothing bound to that stage or it doesn’t apply for the current action) it will be greyed out, but you you can still click on it to inspect the state.

Pipeline Stages

The following table shows the stages that are available for inspection:

  • Pipeline — Shows information about the currently bound pipeline object.
  • Render Pass — Shows information about the current render pass object.
  • FBO  — Shows information related to the Frame Buffer Object that is associated with the current render pass.
  • IA — The Input Assembler shows the layout of your vertex buffers and index buffers.
  • Viewport — Shows the current viewport and scissor information.
  • VS — Shows all of the shader resource views and constant buffers bound to the Vertex Shader stage.
  • TCS — Shows all of the shader resources associated with the Tessellation Control Shader stage.
  • TES — Shows all of the shader resources associated with the Tessellation Evaluation Shader stage.
  • GS — Shows all of the shader resource views and constant buffers bound to the Geometry Shader stage.
  • SO — Shows the resources bound for Stream Output.
  • Raster — Shows the Rasterizer State parameters, including culling mode, scissor and viewport rectangles, etc.
  • FS — Shows all of the shader resources associated with the Fragment Shader stage.
  • Pix Ops — Shows the Pixel Operations parameters, including depth/stencil, multi-sample, and blending states.
  • Compute — This shows all of the shader resource and unordered access views and constant buffers bound to the Compute Shader stage.
  • Misc - Shows miscellaneous information associated with the instance, physical devices, and logical devices.

Pipeline

The Pipeline page shows information about the currently bound pipeline object including: create info, pipeline layout, and push constant ranges.

Render Pass

The Render Pass page shows information about the current render pass including: clear values, attachments operations, and sub-pass dependencies.

Frame Buffer Object (FBO)

The Frame Buffer Object page shows information about the current frame buffer object including: the creation flags, image view attachments, and the current state of the associated textures.

Input Assembler (IA)

The Input Assembler page shows the layout of your vertex buffers and index buffers, as well as the vertex bindings and attribute information.

Shaders (VS, TCS, TES, GS, FS, CS)

The various shader pages display all of the shader modules, including: creation information, human readable SPIR-V source, current push constants, current bound descriptor sets, associated buffers, associated images and samples, and associated texel buffer views for this stage.

Raster

The Raster page shows all rasterization information associated with pipeline object include: polygons modes, cull modes, depth bias, and line widths.

Pixel Operations (Pix Ops)

The Pixel Operations page displays information associated with the current pixel state including: depth/stencil state, multi-sample state, and blending state.

Miscellaneous Information (Misc)

The Miscellaneous Information page shows information related to the instance, physical device(s), logical device(s), and queue(s)

API Statistics View

The API Statistics View is a high-level view of important API calls, and includes information to help you see where GPU and CPU time is spent.

To access this view, go to Frame Debugger > API Statistics.

Current Target View

The Current Target view is used to show the currently bound output targets. This can be useful because it focuses in on the bound output resources, rather than having to search for them in the Resources view.

To access this view, go to Frame Debugger > Current Target.

Current Target will display thumbnails along the left pane for all currently bound color, depth, and stencil targets. This view will change as you scrub from event to event. All of the thumbnails on the left can be selected to show a larger image on the right. You can also click the link below each to open the target in the Resources View.

Event Viewer

The Events view shows all API calls in a captured frame. It also displays both CPU and GPU activity, as a measurement of how much each call "cost."

To access this view, go to Frame Debugger > Events.

To add context to each API call, the thread ID and object/context that made that call are offered. Nsight also supports application-generated object and thread names in these columns; see Naming Objects and Threads for guidance on the supported methods for setting these names.

Clicking a hyperlink in the Events column will bring you to the API Inspector page for that draw call.

Right-clicking on an event or a push/pop range in the Events column will allow you to profile that specific event or range with the Range Profiler.

You can select whether to view the events in a hierarchical or flat view. If multiple performance marker types are used, you can select the correct one, as well as varying levels of verbosity for the call display (variable + value, value, or none). You can also sort the events by clicking on any of the available column headers.

Filtering Events

There are two different ways to filter the events list.

  1. You can select one of the available predefined filters. These offer a set of valuable, built-in filters for events of interest. This is also a great way to learn about the various filtering expressions that are supported, as many of them demonstrate advanced filtering techniques.

  2. You can type in your own filter, which will narrow the list of events to those containing your search string. This filter may be plain text, a regular expression, or a JavaScript expression that does column-specific searches. Select a predefined filter to see examples of JavaScript expressions.

Regex Syntax

This syntax is implemented with a perl-compatible regular expression syntax. Here are some examples of common tasks and the expressions that achieve them:

Table 9. Example regex filtering expressions
Task Expression
Search for a draw call Draw

(or use the predefined filter)

Match OpenGL binding calls glBind
Match D3D AddRef or Release calls AddRef|Release
Search for D3D methods that set constant buffers [A-Z]{2,2}SetConstantBuffers

Javascript Syntax

Javascript syntax enables complex evaluation of filtering expressions. The basic approach for javascript expressions is to match a particular column of data against an expression. Columns are "accessed" via a $('ColumnName') expression. For example, a column titled "Description" is accessed via $('Description'). From there, you can perform mathematical, logical, and text-matching expressions. See some examples below to demonstrate the power and usage of these expressions:

Table 10. Example javascript filtering expressions
Task Expression
Match against the description column for draw

/::Draw/.test($('Description'))

Find events with non-zero GPU time

$('GPU ms') > 0

Find odd events

($('Event') % 2) == 1

Find non-draw events with non-zero GPU time

/::Draw/.test($('Description')) != 1 && $('GPU ms') > 0

Bookmarking

While filtering, it is often desired to keep the context of certain items while you find others. To prevent an event from being filtered, right click the event and select Toggle Bookmark.

If you wish to see the filtered results on the scrubber, you can select the tag button to the right of the filter toolbar, and a new row will appear in the Scrubber that displays your filtered events, allowing you to navigate those events in isolation.

Perfmarkers

On the Events page, you can use the hierarchical view to see a tree view of perf markers. The items listed in the drop-downs correspond with the nested child perf markers on the Scrubber.

If you use the flat view on the Events page, the perf marker won't be nested, but you can hover your mouse over the color-coded field in the far left column, which allows you to view the details about that perf marker.

When an application uses multiple kinds of perfmarkers, the Marker API allows selecting the API to use for the display. This situation may arise if the application uses a middleware, for example, or mixes components with different marker strategies.

Geometry View

The Geometry view takes the state of the Direct3D, OpenGL, or Vulkan machine, along with the parameters for the current draw call, and shows pre-transformed geometry.

To access this View, go to Frame Debugger > Geometry.

There are two views into this data: a graphical view and a memory view.

Graphical Tab

Attribute Options

  • Position — Specifies the vertex attribute to use for positional geometry data.
  • Color — Specifies how to color the geometry. If Diffuse Color is selected, the selected diffuse color swatch will be used for coloring. If a vertex attribute is selected, the selected attribute will be used for per-vertex coloring.

  • Normal — Specifies the per-vertex normal. This selection applies when using a shade mode that specifies Normal Attribute or when rendering normal vectors.

Rendering Options

Clicking Configure in the bottom right corner of the Geometry View will open up the rendering options menu.

  • Reset Camera — Resets the camera to its default orientation. By default, the viewer bounds all geometry with a bounding sphere for optimal orientation.
  • Render Mode — Determines how to render and raster geometry.
    • Solid: renders filled geometry.
    • Points: renders a vertex point cloud.

    • Wireframe: renders a wireframe of the geometry.
    • Wireframe + Solid: renders filled geometry with a wireframe on top of it.
  • Shade Mode — Specifies the lighting mode of the rendered image.
    • Selected Color Attribute: Shades with the specified color attribute
    • Flat Shading Using Generated Normals: Renders the geometry using flat shading with calculated normals
    • Flat Sharing Using Normal Attribute: Renders the geometry using flat shading with the specified Normal Attribute.
    • Smooth Shading Using Normal Attribute: Renders the geometry using smooth shading with the specified Normal Attribute.
  • Render Normal Vectors — Renders the specified normal attribute as a vector pointing from each vertex. The vector may be colored by the Normal Color selection and may be scaled by the Normal Scale selection.

Memory Tab

The Memory tab of the Geometry View shows the contents of the vertex buffer, as interpreted by the current vertex or input attribute specification. This view is useful for seeing the raw data of your draw call. An additional capability of this view is that it highlights invalid or corrupt vertices to streamline finding problematic data.

There are two modes of display for the geometry data:

  1. Index Buffer Order shows the vertices as indexed by the current index buffer and current draw call.

  2. Vertex Buffer Order shows the vertices as linearly laid out from the start of the vertex buffer and draw call specification.

Range Profiler

The Range Profiler is a powerful tool that can help you determine how sections of your frame utilize the GPU, and give you direction to optimize the rendering of your application. Once you have captured a frame, the Range Profiler displays your frame broken down into a collection of ranges, or groups of contiguous actions. For each action, you can see the GPU execution times for each, as well as detailed GPU hardware statistics across all of the units in the GPU. The Range Profiler also includes unmatched data mining capabilities that allow you to group calls in the frame into ranges based on various criteria that you choose.

To access this view, go to Frame Debugger > Range Profiler.

NOTE: Under certain conditions, the Range Profiler pane may be disabled and display one of the following messages.

Hardware signals are not supported in this configuration

This message could be due to one of the following reasons:

  1. You are running Nsight Graphics with a Kepler or lower GPU.
  2. You are using a defunct or non-NVIDIA GPU.
  3. You are attempting to profile an application with a debug or validation layer enabled.
No hardware signals found for this API/GPU combination

This message is likely to occur when you are running Nsight Graphics on a non-MSHybrid laptop.

Sections

The Range Profiler is split up into 3 sections: Range Info, Pipeline, and Memory. Each of these sections has a combo box on the right side of the section header that allows you to choose the different visualizations available for displaying the data: Summary, Range Diagram, Range Table, Timeline, Action Table, and Action Chart (depending on the section). These will be explained further in the corresponding sections below.

Range Info

The Range Info section gives you basic information about the selected range, split up with the draw calls on the left-hand side, and the compute dispatches on the right-hand side. For the draw calls, there is the number of calls in the range as well as the number of primitives and pixels rendered, both total and average per draw call. On the compute side, there is similarly the number of calls, as well as thread and instruction counts, both total and average.

The combo box in this section has 5 entries: Summary, Action Table, Timeline, Range Chart, and Action Chart. The Action Table shows all of the values in the Summary, but in a table format and measured per action. The Timeline show a graph of the GPU time for each action (draw call or dispatch) in the selected range. This will look similar to the Range Selector, but narrowed down to actions in the currently selected range. The Range and Action Charts allow you to take any 4 values and show them in a graphical chart display. The Range version shows a single value per metric for the current range, and the Action version shows the value measured for each action in the selected range.

Pipeline

The Pipeline section gives an overview of how the selected ranges utilized the GPU. It does this by calculating two metrics for each of the units in the GPU pipeline: Speed of Light (or SOL) and Busy percentages.

Speed of Light (SOL): This metric gives an idea of how close the workload came to the maximum throughput of one of the sub-units of the GPU unit in question. The idea is that, for the given amount of time the workload was in the GPU, there is a maximum amount of work that could be done in that unit. These values can include attributes fetched, fragments rasterized, pixels blended, etc. Any value less than 100% indicates that the unit did not process the maximum amount of work possible.

Busy: The busy value gives an idea of what percentage of time the given unit was actively working. While waiting for work to come down the pipeline, or after the work is processed, the unit may be idle, resulting in a busy value less than 100%.

The general rule when looking at GPU performance is to look for units that have a low SOL (work not being performed at peak efficiency) and a medium to high Busy percentage (a lot of work to do). In this case, you will want to look at the work being done to see if it can be accomplished more efficiently. Examples of this on the shader unit can be time spent waiting for memory or the texture unit to return, or overall low occupancy because of a higher local memory or register usage. Another case would be low SOL and low Busy. In this situation, the unit was not busy so it doesn’t matter as much if the work was done as efficiently as possible. In fact, the workload may not have been sufficient to fully populate that GPU sub-unit. Finally, a high SOL with a high Busy should be viewed a little differently. Unlike the first case, the task is not to make the unit more efficient, since the work was done very efficiently. In this instance, you want to try to reduce the amount of work being done, in order to increase overall pipeline throughput.

The Pipeline section has 4 visualization settings.

  1. Summary: This shows the 4 GPU sub-units for the selected range with the lowest Speed of Light values. This gives a quick idea of where the opportunities for optimization may be.
  2. Range Diagram: The Range Diagram shows all GPU sub-units, laid out in pipeline order, with an indicator of the inefficiency factor. If the factor gets over 70%, there will be a yellow outline on the unit, and a red one will appear of the factor gets over 90%.
  3. Range Table: This table puts profiler values for the range in a spreadsheet layout, including all of the constituent metrics for values like the inefficiency factors.
  4. Action Table: Similar to the Range Table, this visualization shows the same values, but measured per draw call and dispatch. This can be helpful to dig from the range into the calls that make it up, looking at the details of the call and how it utilizes the various units of the GPU.

Memory

The Memory section displays information about the L2 cache and Frame Buffer or memory unit. Each interface has a maximum throughput for a given amount of time. The memory section shows the percentage of the subsystem interfaces utilized for the current range.

The Memory section has 4 visualization settings.

  1. Summary: The summary focuses on the L2 cache, since L2 utilization and cache hit percentage are important for many of the units in the GPU. Memory localization, especially in the shader and texture units, are essential for reducing latency. Reading from and writing to the memory subsystem, for either making shader calculations or blending fragment values into the frame buffer, consumes this limited resource.
  2. Range Diagram: The range diagram shows all the GPU units that consume memory, how they are connected to each other in the unit-cache-frame buffer hierarchy, and the percentage utilized of each interface. The L2 unit also has the hit rate overlaid.
  3. Range Table: Like the range table in the Pipeline section, this displays a spreadsheet with all values in tabular form.
  4. Action Table: The action table section measures and displays the interface utilization for every draw call and dispatch in the range. This can be used to look for particularly inefficient users of the L2 cache, for instance.

Scrubber

The Nsight Graphics Frame Debugger has two parts. One part appears as the Frame Debugger window on the host. The other part appears as a Heads-Up Display (HUD) on the target application.

To access this view, go to Frame Debugger > Scrubber.

The part of the Frame Debugger that appears as a HUD on the target machine is comprised of the following:

  • HUD Toolbar — controls the frame capture, along with a number of other options (help, etc.).
  • Frame Scrubber — indicates the current draw event. There is a scrubber view in the Frame Debugger on the host, as well as a frame scrubber on the HUD. The frame scrubber controls stay in synch with each other, meaning that when you move the controls on one, it affects the other. For example, if you move the frame scrubber on the HUD to highlight a new draw event, the scrubber on the Frame Debugger moves in synch to do likewise.

Understanding the Frame Scrubber

For the sake of discussion when it comes to graphics debugging, it helps to note some common terminology.

  • An event is a single call to the API. It could be a triangle draw call, or backbuffer clear, or a less obvious call, like configuring buffers. A snapshot is a sequence of events.
  • An action is a subset of the event types. It can be one of the following: (1) Draw Call, (2) Clear, or (3) Dispatch. Actions are interesting since they explicitly change data which may result in visual changes.
NOTE for Direct3D frame debugging: The Direct3D runtime documentation states that, "the return values of AddRef & Release may be unstable and should not be relied upon." The Nsight Graphics Frame Debugger will also take additional references on objects so any code that relies on an exact reference count at a particular time may fail. In general, users should not expect an exact reference count to be returned from the Direct3D runtime. For more information, see Microsoft's Rules for Managing Reference Counts.

When you debug your graphics project, the Scrubber window shows the perf markers you implemented. When working with user-defined markers, the Scrubber window will use the color and label that you defined for the perf marker.

On the Scrubber, you can select one performance marker and it will automatically create a range of all of the draw calls that occurred within that time frame. Clicking on it again will cause the scrubber to automatically zoom to that range of events. You can zoom in on a nested/child marker the same way.

To zoom out, click the parent performance marker, or use CTRL + mouse wheel.

Performance markers are also displayed on the HUD, color-coded the same way that they are on the Scrubber. However, on the HUD, the information is condensed, and you must hover your mouse over the selected performance marker to get its details.

The default view will show the events in your application, in addition to any performance markers you have defined. Clicking the Add... button will open a dialog that allows you to select what type of range you want to add.

  • Program Ranges — Actions that use the same shader program.
  • Viewport — Actions that render to the same viewport rectangle.
  • Alpha Blending Enabled — Actions that have alpha blending enabled.
  • Alpha Test Enabled — Actions that have alpha test enabled.
  • Back Face Cull Enabled — Actions that have back face cull enabled.
  • User — A range defined by you on the fly. Use SHIFT + left-click and drag the scrubber on the created "User" row to create a new range.

Right-clicking on a specific action in the Scrubber will allow you to open the API Inspector for that action, change your view settings, or initiate a profile session with the Range Profiler.

Scrubber View Options

From the Mode drop-down menu, choose one of the following:

  • Event ID -- Unit Scale is the default view, which simply shows the actions and events on the timeline.
  • Sequence ID -- Unit Scale shows the sequence of events on the timeline.
  • Event ID -- GPU Time Scale displays the GPU activity and how much each event or action cost the GPU.
  • Event ID -- CPU Time Scale displays the CPU activity and how much each event or action cost the CPU.
  • Event ID -- X by CPU, Y by GPU displays the CPU time scale on a horizontal X-axis, and the GPU time scale on a vertical Y-axis.

Depending on which mode you select, you can also select whether you want to view the ruler relative to the capture, viewport, or cursor.

From the Hierarchy drop-down, Queue Centric sorts the events by queue, while Thread Centric sorts the events by the thread.

Using Hotkeys to Scrub Through a Frame

When the scrubber has focus, you can use the following hotkeys to move the scrubber cursor from one event to another.

For the purpose of moving the scrubber cursor, the following are considered action events:

  • Draw methods
  • Clear methods
  • Dispatch methods
  • Present methods

For example, if you are looking for the next draw method that was called, you can press the CTRL + RIGHT ARROW on the keyboard to skip over events that are not typically of interest, and only stop on events that are considered action events.

Resources View

The Resources View allows you to see all of the available resources in the scene.

To access this view, go to Frame Debugger > Resources.

To open the Resources page, go to Frame Debugger > Resources. There are two tabs available here:

  1. Graphical
  2. Memory

At the top of the Resources view, you'll find a toolbar:

  • Clone — makes a copy of the current view, so that you can open another instance.
  • Lock — freezes the current view so that changing the current event does not update this view. This is helpful when trying to compare the state or a resource at two different actions.
  • Save — saves the captured resources to disk.
  • Red, Green, and Blue — toggles on and off specific colors.
  • Alpha — enables alpha visualization. In the neighboring drop-down, you can select one of the following two options:
    • Blend — blends the alpha with a checkerboard background.
    • Grayscale — alpha values are displayed as grayscale.
  • Flip Image — inverts the image of the resource displayed.

Below the toolbar is a set of buttons, described below, for high-level filtering of the resources based on type. Next to that, there is a drop-down menu that allows you to select how you wish to view the resources: thumbnails, small thumbnails, tiles, or details.

If you select the Details view, you can sort the resources by the available column headings (type, name, size, etc.).

The Graphical tab allows you to inspect the resource, pan using the left mouse button to click and drag, zoom using the mouse wheel, and inspect pixel values. Also, this is where you can save the resource to disk. If supported on your GPU and API, this is also where you can initiate a Pixel History session to get all of the contributing fragments for a given pixel.

When you have selected a buffer from the left pane, the Show Histogram button will be available on the right side of the Graphical tab, which allows for remapping the color channels for the resource being viewed.

To modify the histogram view, the following options are available:

  • You can set the minimum and maximum cutoff values via the sliders under the histograms, or by typing in values in the Minimum and Maximum boxes.
  • You can change the scale by using the Log button.
  • The Luminance button allows you to visualize luminance instead of color values.
  • The Normalize button can preset the minimum and maximum values to the extents of the data in the resource.

The Memory tab shows a dump of the resource data.

You can use multiple options to configure how this memory is displayed:

  • The Axis drop-down changes between address (memory offset) and index (array element) views.
  • The Offset entry limits the view to an offset within the given resource.
  • The Extent entry limits the view to a maximum extent within the given resource.
  • The Precision spin box controls the number of decimal places to show for floating point entries.
  • The Hex Display toggles between decimal (base-10) and hex (base-8) display formats.
  • Hash shows a hash value representative of the given memory resource within the current offset and extent. This is useful for comparing memory objects or sub-regions.
  • The Transpose button swaps the rows and columns of the data representation.
  • The Configure button opens the Structured Memory Configuration dialog.

Filtering

There are three ways to filter the available resources.

  1. For high-level filtering, there are color coded buttons to filter based on resource type. All resource types are visible by default, and you can filter the resource list by de-selecting the button for the type you don't want to see. For example, if you'd like to see only textures, you can click the other buttons to de-select them and remove them from the list of resources.

  2. You can manually type in a search string to filter the list of resources.
  3. You can choose from the drop-down of predefined filters to view only large resources, depth resources, unused resources, or resources that change in the frame. Selecting one of these will fill in the JavaScript string necessary for the requested filter, which is also useful as a basis to construct custom filters.

Pixel History

Pixel history enables the automatic detection of the draw, clear, and data-update events that contributed to the change in a pixel's value. In addition, pixel history can identify the fragments that failed to modify a particular texture target, allowing you to understand why a draw might be failing, such as whether you may have misconfigured API state in setting up your pipeline.

To run a pixel history test, click the button and select a pixel to run the experiment on. The Pixel History view will come up with a loading bar and present the results once they are complete.

Structured Memory Configuration

The Structured Memory Configuration dialog allows the user to specify a data layout to interpret the raw data backing the selected resource. For example, a texture may be represented by its colors channels or a uniform buffer may be represented by the various types packed within that buffer.

Typing in a valid structure definition will automatically update the viewer to respect the configuration.

New columns can be created using a simple C-like syntax.

            int;      // creates a column with an anonymous int
            int x;    // creates a second column with an int named x
            float y;  // creates a third column with a float named y

Where additional user types can be defined like the following:

            struct MyType{ int x; float y;};
            struct MyOtherType{ MyType z; double u; };

Many common sized, unsized, and normalized types are permitted as valid types. Vector and matrix types are provided in a similar syntax to HLSL and GLSL. The full list of supported types can be browsed and searched by clicking on the expandable "Defined Types" sub-section of the configuration dialog.

As some additional notes on the parser:

  • Full C/C++ grammar is not supported.
  • Single line comments are accepted; c-style block comments (/* */) are not.
  • Macros are not currently supported.
  • Alignments are not considered; all types are considered packed.
  • To add explicit padding, use padN where N is a multiple of 8.
  • Members can be selectively hidden as well, which can be useful for narrowing your data.

When clicking on a texture resource, the configuration is automatically populated to interpret the channels of that format.

Similarly, buffers are defaulted to a generic byte configuration. A user can typically interpret this buffer data by examining the specific use case. For example, the layout of a vertex buffer can be seen in the Input Assembler section of the API Inspector view, or a uniform buffer can be interpreted by looking at the data layout specified within the shader source.

To persist a configuration, you can click on the Save... button to assign a name to this configuration.

Later, you can restore this configuration by clicking on the Load... button.

Linked Programs View

The Linked Programs View lists all of the shaders in your application.

To access this view, go to Frame Debugger > Linked Programs.

  • If the shader (or its parent program or pipeline object) hasn’t been used by the application yet, it will show up with the symbol in the Status column.
  • If the shader has been used, but the statistics are being calculated, the symbol will be displayed in the Status column.

For programs or pipeline objects, you can view the individual shaders by pressing the ► button to the left of the program/pipeline name. The list also contains a number of statistics:

Name

This is the name of the shader. This name is either generated internally, or can be assigned by the user per API.

Status

This column displays the current status of the shader. The status includes Source or Binary, to denote whether or not source code is available for this shader. Also, if the µCode text is included, this means that we have driver level binary code that is necessary for gathering shader performance metrics.

The symbol means that we are waiting for the shader to be bound by the application.

The symbol means that shader performance metrics are currently being computed.

Cycles

This value is the absolute cost for a single primitive (vertex, tessellation control point, fragment, etc.) to execute through the shader. The value takes into account latency for memory accesses, but it does not take branching or loops into account. The values are summed up at the program level to show the absolute cycle count for a single fragment to be rendered on the screen.

Avg Cycles

Since primitives are submitted in large groups, this gives an average cycle cost for a single primitive, assuming it is submitted as a larger block of work (for instance, many fragments from the same object with the shame shader). The values are summed up at the program level.

ALU/TEX Inst Ratio

This gives the ratio of ALU to texture instructions for the shader. The values are averaged at the program level.

ALU/TEX Cycle Ratio

Since not all ALU calls are the same cost in terms of cycles, this value gives the ratio of cycles spent in ALU versus texture instructions. The values are averaged at the program level.

Regs

This column gives the number of registers used by the program. Register count impacts occupancy/threads in flight so if the value gets too high you will get closer to the Cycles value than the Avg Cycles value.

LMem (Bytes)

This is the number of bytes of local memory used by the shader. Similar to registers, this can impact occupancy and contribute to a lower overall throughput of primitives running this shader.

NOTE: Shader µCode, and thus shader performance metrics are only supported for Direct3D 11, Direct3D 12, and OpenGL. Vulkan support will be added in a future release.

Selected Program(s) & Shader(s)

This pane will show extended shader performance information about the currently selected shader. If multiple shaders are selected, then you can view and compare the statistics between the selected shaders.

Acceleration Structure View

The Acceleration Structure View shows the geometry that has been specified in build commands when running an application that uses ray tracing APIs. If the application does not use these APIs, the view will not be available.

To access this View, go to Frame Debugger > Acceleration Structure.

The view is dual-paned -- it shows a hierarchical view of the acceleration structure on the left, with a graphical view of the structure on the right.

With the hierarchy of the acceleration structure view, the top-level acceleration structure (TLAS) and bottom-level acceleration structure (BLAS) are presented. When a particular TLAS or BLAS is selected, the name, flags, and other meta-data for this structure are listed in a section on the bottom left-hand side. Each item within the tree has a checkbox that allows for cycling through one of full rendering, wireframe rendering, or no rendering of that level of the hierarchy.

Below the rendering pane, information on the camera position and direction are presented. Each of these controls is editable to navigate the scene. The view uses WASD or up, down, left, right keys to change the position. Holding Shift while navigating increases the navigation speed. Clicking with the mouse and dragging allows for additional navigation. Additionally, items in the hierarchy can be double-clicked to place that item in view. To vertically flip the camera, select the double-arrow button. To reset the camera at any time, click Reset Camera.

VR Inspector View

The VR Inspector view allows you to inspect how your application is using VR APIs. It will be available when an application is captured with a supported API. Supported APIs include Oculus (LibOVR) and OpenVR.

To access this view, go to Frame Debugger > VR Inspector.

Once opened, this view is context-specific to the VR API in use. See the sections below for a discussion on each API.

Oculus (LibOVR)

With the Oculus API, the sections of the VR Inspector view include the following:

  • Swap Chains — Lists all swap chains and their associated texture resources and description fields, with links to the Resources View for inspection
  • Mirror Textures — Lists all mirror textures and description fields with Resources View links for the associated texture(s)
  • Render Desc Queries — Shows all of the calls to ovr_GetRenderDesc, along with the parameters, to confirm that the proper eyes, FOV values, etc. are correct
  • HMD Description — Gives details on the actual HMD device connected to the machine and all of the limits for that device

OpenVR

When using OpenVR, you will see the following in the VR Inspector view:

  • Show API Usage — Brings up the Events List view filtered by OpenVR calls
  • OpenVR Version — In the top left, under Show API Usage, the minimally compatible version of OpenVR you are using will be displayed. This may be lower than the version your application has targeted, due to the fact that it may not be using any features of later API versions.
  • Mirror Textures — Lists all mirror textures and description fields with Resources View links for the associated texture(s)

The following sections return interface dependent information:

  • VRSystem — displays the render target size
  • VRSystem Tracked Devices — displays information for each tracked device currently connected
  • VRSettings — displays all of the VRSettings properties
  • VRChaperone — displays the play area information
  • VRCompositor — displays rendering and compositing statistics

D3D12 Specific Views

D3D12 Descriptor Heaps

The Descriptor Heaps view displays all of the descriptor heaps bound for the current event.

To access this view, go to Frame Debugger > Descriptor Heaps.

On the left are the descriptor heaps available, and on the right you can view the properties of each descriptor heap. Along the top of the details pane, you can see how populated the descriptor heap is, as well as the maximum contiguous valid and invalid ranges. These properties can help you dive into each descriptor heap, and use it as a diagnostic tool to find any potential bugs in your application.

Note that if you click the hyperlink in the Resources column, it will bring up the Resources view.

D3D12 Heaps View

The Heaps view provides a list of all heaps created by the application, along with detailed information about the resources contained in each heap.

To access this view, go to Frame Debugger > Heaps.

When you select a heap from the left pane, you will see all one of two types of entries: Placed Resources or Tiles. Clicking the hyperlink in the Placed Resources box will take you to the Resources Graphical tab.

Tiles are used to populate sections of a tiled resource.

The right side of the Heaps view displays the memory data associated with the selected resource, which can also be seen on the Memory tab of the Resources view.

Heap Map

The Heap Map shows a high-level layout of how the heap is currently being used. You can view the usage either by Type (for example, Buffer, Texture2D, etc.) or by the name of the Resource.

Type:

Resource: 

The Heap Map shows any overlapping regions within the heap.

D3D12 Root Parameters

The Root Parameters view displays all of the root parameters bound for the current event. This allows you to quickly change the state of what you're sampling from, constants, and other descriptors at a lightweight, faster rate than past APIs.

To access this view, go to Frame Debugger > Root Parameters.

The root signature displays the structure definition of what's bound at that moment. Root parameters fill in that structure with the values you're sampling from and the constants you're using.

When you select a root parameter on the left, the root arguments for that parameter are displayed on the right. This shows residency information, any invalid descriptors are displayed in red. Using root parameters as a diagnostic tool can help prevent a GPU fault.

Note that if you click the hyperlink in the Resources column, it will bring up the Resources view.

Vulkan Specific Views

Vulkan Descriptor Sets View

The Descriptor Set view displays all of the descriptor sets currently allocated and bound by the application at the current event.

To access this view, go to Frame Debugger > Descriptor Sets.

The left pane displays a selectable list of descriptor sets along with their layout, pool, consumption counts, and dynamics offsets.

When a set is selected, the right pane will display the resources currently associated with this descriptor set, as well as information related to the pool from which this descriptor set was allocated. In addition, clicking on a resource within the descriptor set will display more detailed information about that specific resource.

Note that if you click the hyperlink in the Preview column, it will bring up the Resources view associated with this image or buffer.

Vulkan Device Memory View

The Device Memory view provides a list of all device memory allocated by the application, along with detailed information about the resources contained in each memory region.

To access this view, go to Frame Debugger > Device Memory.

The left-most pane contains information about all device memory objects currently allocated. Once a device memory object is selected, the contained resources will be listed in the middle pane, along with the resource layout map in the bottom left, and contained data on the right.

Vulkan Memory Pools

Vulkan Texture and Sampler Pools

The Texture and Sampler Pools View provides a visualization of these different pool types. This can be useful for determining if a particular set of resources are in the resource pools they are expected to be in. The left hand side allows you to select the pool you're interested in, based on type. Included in the list are appropriate parameters about how the pool was created. On the right side is a list of the resource descriptors, some information about the resource itself, and a thumbnail preview. There is a link below the thumbnail that allows you to open that resource in the Resources View for deeper inspection.

To access this view, go to Frame Debugger > Texture and Sampler Pools.

Generate C++ Capture UI

Compiling and launching C++ captures

The additional features of an nsight-gfxcppcap file include:

  1. Screenshot of the capture taken from the original application
  2. Information about the captured application and its original system
  3. Statistics about the captured API stream
  4. Utilities to build the C++ capture without opening the generated Visual Studio project
  5. Utilities to launch the compiled application:
    1. The Execute button will launch the compiled executable.
    2. The Connect... button will populate a new connection dialog that allows you to run a specific activity on the generated capture.
  6. User comments that are persisted within this file.

GPU Trace UI

GPU Trace Capture File

The GPU Trace window is comprised of 3 sections:

  1. Scrubber
  2. Information Tabs
  3. Events table

GPU Trace Scrubber

The GPU Trace scrubber is the main component in which you can observe the captured frame data.

GPU Occupancy

The GPU Occupancy row shows the occupancy of the hardware stages, in terms of warps. This shows the total warps' execution on the GPU. The warps may be grouped and colored according to stages, marker, actions, or Command Lists. By default, the warps' color is determined by stage (e.g., Vertex, Geometry, Hull, Domain, Fragment, Compute, Pixel, Compute, and Async Compute shaders).

Note: A compute shader running on an asynchronous queue will display from top to bottom (the orange in the below picture). All other shaders, which run on the graphics queue, will display from bottom to top.

When hovering your mouse over the scrubber, a tooltip will appear that displays the percentage of the warps' occupancy by stage, per the specific time.

GPCxTPC

The GPCxTPC row is a semantic grouping of the graphics processing units. Each GPU has multiple GPCs (Graphics Processing Clusters). Each GPC has numerous logical Texture Processing Clusters, or TPCs. The TPCs are numbered.

TPC#N blocks from each GPC in the GPU serve as a logical group for warp executions.

For example: if you have 1 GPU with 2 GPCs, and 3 TPCs per GPC, you’ll get the following logical groupings:

  • GPCxTPC0 holds GPC0TPC0, GPC1TPC0
  • GPCxTPC1 holds GPC0TPC1, GPC1TPC1
  • GPCxTPC2 holds GPC0TPC2, GPC1TPC2

SM Active

The SM Active row shows a "flattened" view of the warps, in terms of total occupancy for each SM. If an individual SM has at least one warp in it, then it is counted as active. If all SMs have at least one active warp, then this graph would show a value 100% the height of the row. All graphics operations are shown in green, while all compute operations are shown in orange or pink.

Graphics/Compute Idle

The Graphics/Compute Idle metric represents the percentage of cycles spent idle in the Graphics Engine of the GPU, meaning the GPU is not performing any warp's execution for graphics and compute. The Graphics Engine services 3D, 2D, I2M, and Compute processing hardware. The GPU is composed of several engines, including Copy, Video, Display, and Security. As such, this metric does not account for any GPU work included in that list. Idle Graphics/Compute time may indicate:

  1. The workload is CPU-bound. This could occur when the CPU is not feeding commands fast enough to the GPU Front End (FE), so the FE has no work to process.
  2. DX12 Wait calls on fences in the Graphics/Compute engine (see Synchronization and Multi-Engine).

Frames

The Frames row helps detect warps per frame. When selecting a frame, only the warps related to the selected frame will be highlighted. When hovering your mouse over the frame, a tooltip summarizing the warps' activity will appear.

Command Lists

This row shows the captured Command List per queue. The Command List ID is displayed on the Command List bar. By hovering with the mouse, a tooltip summarizing the warp's activity will appear. Clicking on a Command List will highlight all of the warps which are part of that Command List.

Incremental Actions

Incremental Actions represent actions by how much extra time is required to execute them on a GPU. If an action can execute inside the time of another action (i.e., perfect parallelism), then it would have no width on this line.

Actions

An action is a subset of the event types. It can be one of the following:

  1. Draw call
  2. Clear
  3. Dispatch

Actions are of interest because they explicitly change data, which may result in visual changes on the scrubber.

The Actions row shows the captured actions per queue. By hovering with the mouse, a tooltip summarizing the warp's activity will appear, as well as the action's ID. Clicking on the Action will highlight all of the warps which are part of that Action.

User Markers

GPU Trace also captures any user markers that exist in the application. This may help understand the frame workflow.

Scrubber Toolbar

At the top of the Scrubber view, there are 4 buttons that extend the Scrubber's capabilities.

View Options

The View button on the Scrubber toolbar allows you to change the way the data is presented, from Grouped by GPCxTPC to per SM and vice versa. Viewing the warp's occupancy by SM may give more information as to how the GPU is being utilized. This is how the view appears like when sorted per SM:

Color Options

Color by Markers

The Direct3D 12 API added the ability to add User markers. This helps us to understand the frame execution and debug it. GPU Trace captures user markers, and also allows you to color the warps according to the marker execution.

Color by Action

Choosing this option will color the warps according to which Action they belong to.

Color by Command List

Choosing this option will color the warps according to which Command List they belong to.

Export

GPU Trace allows you to export the warps data into a file in CSV format. This provides some flexibility in further calculation, if desired.

Zoom to Fit

Clicking this button will restore the original window zoom.

Annotations

While analyzing the captured data, you may want to add comments in certain locations. This can later serve as a reminder of where to look, or it may help if you wish to send the file to another user for further analysis.

To add an annotation, do the following:

  1. Select a range within the Annotations row.
  2. In the information section, the annotation tab will be opened.
  3. The annotation has 2 sections which you can edit:
    1. Label
    2. Description

The annotation section behaves like any other component in the scrubber; you can select, hover to review the corresponding tooltip, or zoom.

Once you create an annotation, an asterisk will appear by the name of the file in the window tab:

You can use the File menu to Save (or Save As...) the file for future reference.

Using the Scrubber

For more information on using the Scrubber, see the Scrubber view.

Information Tabs

Summary Tab

The Summary tab shows important information on the captured data and the area that was selected in the Scrubber.

The main table shows information per shader type. You can uncheck the box next to a certain shader type to filter out those warps in the Scrubber view, and thus make it easier to understand the warp's occupancy.

  • Range — Shows whether the data shown reflects the entire captured data or a user selection.
  • Duration — The duration of the range selected.
  • Start and End — Start and end times of the range selected.
  • Num Warps — The number of warps in the range.
  • Num GPCxTPC — The number of GPCxTPCs participating in the range's execution.
  • Num SM — The number of SMs participating in the range's execution.
  • Warp Active Time — The table shows time, percentage, and number of warps per stage. In this table, you can view the color scheme for each stage. The tables also allow you to select and deselect warps, according to which stage they belong to.

  • SM Active time — This table shows how many SMs were active, in percentage, according to Graphics, Compute, and Async Compute for the selected range. This view can quickly show whether the GPU was idle unexpectedly longer then assumed for this range, which could indicate the application is CPU-bound.

Capture Information Tab

The Capture Information tab gives general information about the captured file. This might be useful when trying to analyze workload behavior or reproduce issues.

  • Session Info — The Session Info section includes the process file name and location, as well as any command line arguments that were used when the application was run.
  • System Info — This section lists the computer name, operating system, operating system build, and processor that was used.
  • GPU Device — The GPU that was used when running the session.
  • User Comments — This field can be changed or edited by the user. It is saved in the captured file for future reference. This can be very useful when collaborating with others on an application.

Annotations Tab

The Annotations tab is used for getting the data for a newly created annotation. For specifications, see the Annotations section.

Events Table

The Events Table summarizes the captured events according to types:

  • Performance markers
  • Draw Calls
  • Command Lists
  • Dispatches

The Events Table allows you to browse the various events, and sort them according to name, queue, duration, Incremental Cost, or frame.

The Events table correlates with the Scrubber. Selecting one of the events will automatically select the correlating event in the Scrubber, making it easy to find.

Project Explorer

The Project explorer offers a view of all of the data that is associated with the current project. It will contain data files, sorted by the time of generation. Note that you may also include arbitrary links to other files as a useful aid in correlating data.

In addition to navigating via the Project Explorer, you may wish to see the files that were recently generated. Load these through File > Recent Files, or File > Open File.

Options

The Options dialog, accessed via the Tools > Options... menu, allows you to configure Nsight Graphics in a number of different ways. Each section is detailed below. The options selected are persisted in user settings for the next time you run the tool.

Environment

On the Environment tab, select whether to use the light or dark theme, the default document folder for Nsight Graphics to use, and your preferred startup behavior.

GPU Trace

On the GPU Trace tab, you can change the time units and the time precision that are displayed in a GPU Trace.

Injection

On the Injection tab, select whether to enable or disable debugging Steam overlay.

Frame Debugger (Host)

On the Frame Debugger tab, you can configure the time unit and precision settings for the host display, settings for C++ Capture, and set the timeout for a Pixel History.

Feedback

On the Feedback tab, choose whether or not you wish to allow Nsight Graphics to collect usage and platform data.

Common Capabilities

Nsight Graphics supports docking multiple windows within the main window. Any window may be moved, adjusted, tabbed, or pulled out from the docking system that it provides. Most default layouts have multiple documents already specified, but if you wish to adjust these documents you can do so at any time.

Beyond positioning, when frame debugging or profiling, there are buttons that are common across several frame debugger views.

  • The Clone button makes a copy of the current view, so that you can compare different parts of the API Inspector (or other cloned views) for the current action.
  • The Lock button freezes the current view so that changing the current event does not update this view. This is helpful when trying to compare the same state on two different actions.

Troubleshooting

Due to the complex nature of the underlying mechanisms that make arbitrary application analysis possible, there is the possibility of errors. Nsight Graphics offers a significant number of ways where you can discover opportunities to correct issues that you may encounter.

See the sections below for general tools as well as listings of common problems and possible solutions for them.

General Tools

This section provides troubleshooting tips for Nsight Graphics.

Output Messages

Throughout the operation of the tool, Nsight Graphics provides messages that inform on the status of operations as well as if any issues are encountered. This could provide some assistance when trying to determine why your application may not run, connect, or capture correctly. Error messages are indicated by a red flag in the bottom right of the application window. This flag may be double-clicked to open the Output Messages window. Alternatively this window may be accessed via Tools > Output Messages.

Crash Reporting

When an application crashes, a crash report can be one of the most valuable pieces of information in helping to fix the issue. Accordingly, if you have the ability to send a crash report, it would be greatly appreciated.

Automatic Crash Reports

Nsight Graphics's (host and target) are configured to automatically send crash reports when they encounter an error. Submitting via the dialog is a good approach, but saving the minidump for explicit communication can be useful too. If you encounter a crash and do not have the option of sending a crash report, please let us know so that we can investigate why the dialog is not being generated.

Manual Crash Reports

Manual crash reports can always be collected by attaching to the crashing process with a debugger and manually creating a dump in the case of a crash. In Visual Studio, this can be accomplished by:

  1. Start Visual Studio
  2. Follow the instructions for Debugging your application with a debugger
  3. Start the application with Nsight Graphics
  4. Attach the Visual Studio debugger to it
  5. When you encounter the the crash, use the Visual Studio "Debug > Save Dump As" menu option

Debugging your application with a debugger

Although launching your application with Nsight Graphics might appear to be an alternative to CPU debugging, the application that is launched is still very much a debuggable application. This can be useful to determine if a problem you are encountering is in your own code by tracing the paths taken by your application.

To do this, set an environment variable of NVIDIA_PROCESS_INJECTION_ATTACH_DIALOG=1 and attach a debugger when you see a message box. Click ok to resume your application once you have set breakpoints that will allow you to inspect if your application is following the expected paths.

Collecting DirectX debug logging

Sometimes a device lost or other issue can be narrowed by observing what the DirectX debug layer has to say.

If you need to install the layer it should be part of the OS in Windows 10:

Apps&features -> Manage optional features -> Graphics Tools

Then open dxcpl which should look like this. Make sure your installed application is in the Scope List and force on debugging.

2 ways to see the spew:

  1. You can see logging without attaching VisualStudio by just running DbgView.exe. https://docs.microsoft.com/en-us/sysinternals/downloads/debugview
  2. Alternately attach using Visual Studio. Logging will be in the VS Output window. See section below to enable attaching to the launched process.

Setting an environment variable

There are occasionally times where you might be asked to set an undocumented variable to help disambiguate problems.

Apply the environment variable in the connection dialog 'Environment' setting when starting an application.

Common problems

Problem - The application fails to launch

You've tried to launch your application, but it is failing to launch.

Possible Causes

  1. Incorrect command line arguments
  2. Incorrect working directory
  3. You're trying to launch on a remote machine that does not have a monitor running

Possible Solutions

Make sure that your command line arguments and working directory are as expected.

If you are trying to run on a remote machine, please ensure that the remote monitor is running and that the name of the machine is correct. See Remote Launching.

Disambiguate if the application is launching at all. Follow the instructions in Debugging your application with a debugger. Check to see if your application is launched at all and if so, whether it is following its expected path. If the application doesn't launch at all, please send an email to devtools-support@nvidia.com .

Problem - The application crashes at runtime

You've found that your application appears to launch, but it crashes during runtime.

Possible Causes

  1. Lack of API support by Nsight
  2. Application not checking return codes from device/object creation, assuming it has succeeded
  3. Interception-library crash
  4. Internal-driver crash
  5. D3D-debug runtime interaction

Possible Solutions

Try disabling the following features:

For D3D apps, try running without the D3D debug runtime enabled, as the debug runtime occasionally differs in behavior when compared with the release runtime.

If none of the above works, please try to collect a crash dump if possible and send it to devtools-support@nvidia.com .

Problem - The application hangs at runtime

You've found that your application appears to launch, but it hangs during runtime.

Possible Causes

  1. Multi-threading issue
  2. HUD Issue

Possible Solutions

Try disabling the following features:

If none of the above works, please try to collect a crash dump if possible and send it to devtools-support@nvidia.com .

Problem - The application crashes during capture

You've found that you're able to run the application successfully, but upon trying to performa a live analysis, the application crashes.

Possible Causes

  1. Multi-threading issue
  2. Out of memory
  3. The application is tearing itself down due to a watchdog timeout

Possible Solutions

Try disabling the following features:

If you suspect a multi-threading issue (D3D's runtime sometimes indicates this), try disabling multi-threaded capture.

If Nsight Graphics reports out of memory, trying reducing the requirements of the application or try running with a more capable GPU.

If the application exits without any clear sign of a crash, the application could be tearing itself down. Please contact devtools-support@nvidia.com with your concern and we will investigate if there is any opportunity for deactivating the thread.

Problem - The application captures successfully, but exits after a time in capture

This problem indicates that you have had some level of success, but even if the application generally inactive, the application crashes.

Possible Causes

  1. Serving a host query leads to a crash
  2. Memory leak
  3. Watchdog timer

Possible Solutions

When encountering this issue, take note of what you are doing when you encounter if. The first thing to try is doing nothing – does the application still crash when doing so? If there is nothing going on, this is either a memory leak or a watchdog timer.

  1. Look at the memory usage of the process – is it growing? It's a memory leak, either from the application or the tool.
  2. Set a stopwatch to count how long it takes to crash – is it a "round" number like 30 or 60 seconds? It's probably a watchdog.
  3. If this is a memory leak (uncommon but possible) please contact support to help identify the issue.

If this is a watchdog issue, disable the watchdog in your application.

Problem - The application runs extremely slowly

You've observed that the application runs at a significantly lower rate than normal operation.

Possible Causes

  1. Too much work is being done
  2. The application may be exercising uncommon paths

Possible Solutions

Try disabling optional features, such as collecting shader sourcecollecting native shaders, or collecting hardware performance metrics.

Problem - The D3D12 replayer shows more CPU overhead than expected

If you encounter more overhead in your range profiling session or generated C++ capture, conservative synchronization may be the problem.

Possible Causes

  1. Nsight's default fence syncing policy may be too conservative for this application

Possible Solutions

Try experimenting with replay fence behavior.

Problem - I can't attach to the application

The application launches, but you are unable to attach to it with the Nsight Graphics host.

Possible Causes

  1. You launched a piece of the process hierarchy without Nsight Graphics
  2. You set the connection to automatically attach when the root application launches child processes that are the actual processes of interest.
  3. The application is interfering with the interception of Nsight Graphics, preventing it from intercepting.
  4. The application is using a software renderer

Possible Solutions

Nsight Graphics is essentially in-process debuggers and so it cannot attach to an application that wasn’t originally launched through Nsight Graphics. The attach feature is meant to be used to attach to applications that have been launched through other means (e.g. a command line launcher) as well as to allow for some recoverability in the case of a host issue, as it allows you attach at a later time.

Make sure to kill any processes related to the process hierarchy of an application and try to launch it again.

Problem - The Host UI Crashes

The host UI crashes while you are analyzing an application.

Possible Causes

  1. UI Bug

Possible Solutions

Try reducing the number of views that you have open when running to pinpoint which view causes the issue.

If at all possible, try to collect a crash dump of the UI application and send it to devtools-support@nvidia.com

Try deleting the UI persistence data with Help > Reset Application Data .

Appendix

Feature Support Matrix

Nsight Graphics feature matrix.

Table 11.
Feature D3D11 D3D12 OpenGL Vulkan
Frame Capture and Live Analysis Yes Yes Yes Yes
Range Profiling and Performance Counters Yes Yes Yes  

Real-time Performance Signals

Yes Yes Yes  

Real-time Performance Experiments

Yes Yes Yes  

C++ Capture

Yes Yes Yes Yes

Shader Performance Analysis

Yes Yes Yes Yes

Pixel History

Yes Yes Yes Yes

Dynamic Shader Editing

Yes Yes Yes  

GPU Trace

  Yes    

Ray Tracing Debugging

  Yes   Yes

Supported OpenGL Functions

Nsight Graphics's Frame Debugger supports the set of OpenGL operations, which are defined by the OpenGL 4.5 core profile. Note that it is not necessary to create a core profile context to make use of the frame debugger. An application that uses a compatibility profile context, but restricts itself to using the OpenGL 4.5 core subset, will also work. A few OpenGL 4.5 compatibility profile features, such as support for alpha testing and a default vertex array object, are also supported.

The Frame Debugger supports three classes of OpenGL extensions, described below.

1. OpenGL Core Context Support

The OpenGL extensions listed below are supported in as much as the extension has been adopted by the OpenGL 4.5 core profile. For example, EXT_subtexture is included as part of OpenGL 1.1. Calls to glTexSubImage2DEXT are supported and behave the same as calls to glTexSubImage2D. On the other hand, while EXT_vertex_array is also included as part of OpenGL 1.1, glColorPointerEXT is not supported by the Frame Debugger. The operation of glColorPointerEXT was modified when it was included as part of OpenGL 1.1. Additionally, glColorPointer is part of the compatibility subset, but not the core subset.

// GL 1.1
EXT_vertex_array
EXT_polygon_offset
EXT_blend_logic_op
EXT_texture
EXT_copy_texture
EXT_subtexture
EXT_texture_object
// GL 1.2
EXT_texture3D
EXT_bgra
EXT_packed_pixels
EXT_rescale_normal
EXT_separate_specular_color
SGIS_texture_edge_clamp
SGIS_texture_lod
EXT_draw_range_elements
EXT_color_table
EXT_color_subtable
EXT_convolutionHP_convolution_border_modes
SGI_color_matrix
EXT_histogram
EXT_blend_color
EXT_blend_minmax
EXT_blend_subtract
// GL 1.2.1
EXT_SGIS_multitexture
// GL 1.3
ARB_texture_compression
ARB_texture_cube_map
ARB_multisample
ARB_multitexture
ARB_texture_env_add
ARB_texture_env_combine
ARB_texture_env_dot3
ARB_texture_border_clamp
ARB_transpose_matrix
// GL 1.4
SGIS_generate_mipmap
NV_blend_square
ARB_depth_texture
ARB_shadow
EXT_fog_coord
EXT_multi_draw_arrays
ARB_point_arameters
EXT_secondary_color
EXT_blend_func_separate
EXT_stencil_wrap
EXT_texture_env_crossbar
EXT_texture_lod_bias
ARB_texture_mirrored_repeat
ARB_window_pos
// GL 1.5
ARB_vertex_buffer_object
ARB_occlusion_query
EXT_shadow_funcs
// GL 2.0
ARB_shader_objects
ARB_vertex_shader
ARB_fragment_shader
ARB_draw_buffers
ARB_texture_non_power_of_two
ARB_point_sprite
EXT_blend_equation_separate
ATI_separate_stencil
EXT_stencil_two_side
// GL 2.1
ARB_pixel_buffer_object
EXT_direct_state_access
EXT_texture_sRGB
// GL 3.0
EXT_gpu_shader4
NV_conditional_render
APPLE_flush_buffer_range
ARB_color_buffer_float
NV_depth_buffer_float
ARB_texture_float
EXT_packed_float
EXT_texture_shared_exponent
EXT_framebuffer_object
NV_half_float
ARB_half_float_pixel
EXT_framebuffer_multisample
EXT_framebuffer_blit
EXT_texture_integer
EXT_texture_array
EXT_packed_depth_stencil
EXT_draw_buffers2
EXT_texture_compression_rgtc
EXT_transform_feedback
APPLE_vertex_array_object
EXT_framebuffer_sRGB
// GL 3.1
EXT_draw_instanced
ARB_draw_instanced
ARB_copy_buffer
NV_primitive_restart
ARB_texture_buffer_object
ARB_texture_rectangle
ARB_uniform_buffer_object
// GL 3.2
ARB_vertex_array_bgra
ARB_draw_elements_base_vertex
ARB_fragment_coord_conventions
ARB_provoking_vertex
ARB_seamless_cube_map
ARB_texture_multisample
ARB_depth_clamp
ARB_geometry_shader_4
ARB_sync
// GL 3.3
ARB_shader_bit_encoding
ARB_blend_func_extended
ARB_explicit_attrib_location
ARB_occlusion_query2
ARB_sampler_objects
ARB_texture_rgb10_a2ui
ARB_texture_swizzle
ARB_timer_query
ARB_instanced_arrays
ARB_vertex_type_2_10_10_10_rev
// GL 4.0
ARB_texture_query_lod
ARB_draw_buffers_blend
ARB_draw_indirect
ARB_gpu_shader5
ARB_gpu_shader_fp64
ARB_sample_shading
ARB_shader_subroutine
ARB_tessellation_shader
ARB_texture_buffer_object_rgb32
ARB_texture_cube_map_array
ARB_texture_gather
ARB_transform_feedback2
ARB_transform_feedback3
// GL 4.1
ARB_ES2_compatibility
ARB_get_program_binary
ARB_separate_shader_objects
ARB_shader_precision
ARB_vertex_attrib_64bit
ARB_viewport_array
// GL 4.2
ARB_texture_compression_bptc
ARB_compressed_texture_pixel_storage
ARB_shader_atomic_counters
ARB_texture_storage
ARB_transform_feedback_instanced
ARB_base_instance
ARB_shader_image_load_store
ARB_conservative_depth
ARB_shading_language_420pack
ARB_internalformat_query
ARB_map_buffer_alignment
// GL 4.3
ARB_multi_draw_indirect
ARB_program_interface_query
ARB_shader_storage_buffer_object
ARB_copy_image
ARB_vertex_attrib_binding
ARB_texture_view
ARB_invalidate_subdata
ARB_framebuffer_no_attachments
ARB_stencil_texturing
ARB_explicit_uniform_location
ARB_texture_storage_multisample
ARB_program_interface_query
ARB_robust_buffer_access_behavior
ARB_ES3_compatibility
ARB_clear_buffer_object
ARB_internal_format_query2
ARB_texture_buffer_range
ARB_compute_shader
ARB_debug_group
ARB_debug_label
ARB_debug_output
// GL 4.4
ARB_query_buffer_object
ARB_enhanced_layouts
ARB_multi_bind
ARB_vertex_type_10f_11f_11f_rev
ARB_texture_mirror_clamp_to_edge
ARB_clear_texture
// GL 4.5
ARB_clip_control
ARB_cull_distance
ARB_conditional_render_inverted
GL_KHR_context_flush_control
ARB_get_texture_sub_image
GL_KHR_robustness
ARB_texture_barrier
ARB_ES3_1_compatibility
ARB_direct_state_access
ARB_shader_texture_image_samples
ARB_derivative_control

2. Other Supported OpenGL Extensions

The second class of OpenGL extensions is listed below. These extensions are not part of OpenGL 4.5 core or compatibility, but are fully supported by the frame debugger target. Context and object state, which is added by these extensions, may not be displayed by the host UI.

ARB_framebuffer_object
EXT_texture_filter_anisotropic
NV_buffer_store
ARB_vertex_attrib_binding
ARB_multi_draw_indirect
NV_gpu_multicast
ARB_parallel_shader_compile
ARB_seamless_cubemap_per_texture
NV_shader_buffer_load
NV_vertex_buffer_unified_memory

3. Partially Supported OpenGL Extensions

The third class of OpenGL extensions are ones for which there is partial support. These extensions are listed below.

ARB_bindless_texture
WGL_ARB_extensions_string
WGL_ARB_pixel_format
WGL_EXT_extensions_string
WGL_EXT_swap_control
WGL_EXT_swap_control_tear
WGL_ARB_create_context

Supported Vulkan Functions

Nsight Graphics™ 2018.7 frame debugging supports all of Vulkan 1.1.86, with the exception of functions and resources associated with sparse textures.

NOTE: Sparse texture support will be added to a future version of Nsight Graphics.

Additionally, the follow extensions are supported:

VK_AMD_gcn_shader
VK_AMD_gpu_shader_half_float
VK_AMD_negative_viewport_height
VK_AMD_rasterization_order
VK_AMD_shader_ballot
VK_AMD_shader_explicit_vertex_parameter
VK_AMD_shader_trinary_minmax
VK_EXT_blend_operation_advanced
VK_EXT_conditional_rendering
VK_EXT_conservative_rasterization
VK_EXT_debug_marker
VK_EXT_debug_report
VK_EXT_debug_utils
VK_EXT_depth_range_unrestricted
VK_EXT_descriptor_indexing
VK_EXT_discard_rectangles
VK_EXT_global_priority
VK_EXT_hdr_metadata
VK_EXT_post_depth_coverage
VK_EXT_queue_family_foreign
VK_EXT_sample_locations
VK_EXT_sampler_filter_minmax
VK_EXT_shader_stencil_export
VK_EXT_shader_subgroup_ballot
VK_EXT_shader_subgroup_vote
VK_EXT_shader_viewport_index_layer
VK_EXT_swapchain_colorspace
VK_EXT_validation_cache
VK_EXT_validation_flags
VK_EXT_vertex_attribute_divisor
VK_IMG_filter_cubic
VK_IMG_format_pvrtc
VK_KHR_16bit_storage
VK_KHR_8bit_storage
VK_KHR_android_surface
VK_KHR_bind_memory2
VK_KHR_create_renderpass2
VK_KHR_dedicated_allocation
VK_KHR_descriptor_update_template
VK_KHR_device_group
VK_KHR_device_group_creation
VK_KHR_display
VK_KHR_display_swapchain
VK_KHR_draw_indirect_count
VK_KHR_external_fence
VK_KHR_external_fence_capabilities
VK_KHR_external_memory
VK_KHR_external_memory_capabilities
VK_KHR_external_memory_fd
VK_KHR_external_semaphore
VK_KHR_external_semaphore_capabilities
VK_KHR_external_semaphore_fd
VK_KHR_external_memory_win32
VK_KHR_external_semaphore_win32
VK_KHR_get_memory_requirements2
VK_KHR_get_physical_device_properties2
VK_KHR_get_surface_capabilities2
VK_KHR_image_format_list
VK_KHR_incremental_present
VK_KHR_maintenance1
VK_KHR_maintenance2
VK_KHR_maintenance3
VK_KHR_multiview
VK_KHR_push_descriptor
VK_KHR_relaxed_block_layout
VK_KHR_sampler_mirror_clamp_to_edge
VK_KHR_sampler_ycbcr_conversion
VK_KHR_shader_draw_parameters
VK_KHR_shared_presentable_image
VK_KHR_storage_buffer_storage_class
VK_KHR_surface
VK_KHR_swapchain
VK_KHR_variable_pointers
VK_KHR_vulkan_memory_model
VK_KHR_wayland_surface
VK_KHR_win32_keyed_mutex
VK_KHR_win32_surface
VK_KHR_xcb_surface
VK_KHR_xlib_surface
VK_NV_clip_space_w_scaling
VK_NV_dedicated_allocation
VK_NV_device_diagnostic_checkpoints
VK_NV_external_memory_capabilitie
VK_NV_fill_rectangle
VK_NV_fragment_coverage_to_color
VK_NV_framebuffer_mixed_samples
VK_NV_geometry_shader_passthrough
VK_NV_geometry_shader_passthrough
VK_NV_glsl_shader
VK_NV_sample_mask_override_coverage
VK_NV_shader_subgroup_partitioned
VK_NV_viewport_array2
VK_NV_viewport_swizzle
VK_NVX_raytracing
VK_NV_compute_shader_derivatives
VK_NV_corner_sampled_image
VK_NV_fill_rectangle
VK_NV_fragment_shader_barycentric
VK_NV_representative_fragment_test
VK_NV_scissor_exclusive
VK_NV_shader_image_footprint
VK_NV_shading_rate_image

Unsupported Captures

Nsight Graphics maintains a list of the unsupported functions or operations that are used by the application. If an unsupported operation is encountered, an unsupported capture will be reported. This unsupported capture guards against crashes or incorrect results coming from known limitations.

In some cases, however, these unsupported operations might not impact any analysis that follows. Accordingly, after warning about the risks of an unsupported capture, Nsight Graphics will offer the opportunity to proceed despite this warning. If the user proceeds, Nsight Graphics will continue into capture on a best-effort basis.

If you determine that this unsupported operation is innocuous, and you wish to turn it off completely, you may suppress this warning via the Ignore Incompatibilities option. Note that this will prevent you from being notified of future incompatibilities, however, so please use with caution.

Update Notification

Nsight Graphics can check for a new version and notify the user of any updates. There are 2 options available for controlling this feature, found in the Environment tab of the Tools > Options view.

By default, Nsight Graphics checks for updates every time the app is started. This can be changed by selecting “No” for the “Check for updates at startup” option. With this option disabled, Nsight Graphics will still check for updates every 3 days.

Update notifications can be completely disabled by setting the “Show version update notifications” value to “No”.

If the automatic checking feature is disabled, the user can still check for updates by selecting the Help > Check for updates… menu option.