NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition 5.3 User Guide
Send Feedback
The Range Profiler is a powerful tool that can help you determine how sections of your frame utilize the GPU, and give you direction to optimize the rendering of your application. Once you have captured a frame, the Range Profiler displays your frame broken down into a collection of ranges, or groups of contiguous actions. For each action, you can see the GPU execution times for each, as well as detailed GPU hardware statistics across all of the units in the GPU.
The Range Profiler also includes unmatched data mining capabilities that allow you to group calls in the frame into ranges based on various criteria that you choose.
To profile a graphics frame:
![]() |
NOTE: Under certain conditions, the Range Profiler pane may be disabled and display one of the following messages. Hardware signals are not supported in this configuration This message could be due to one of the following reasons:
No hardware signals found for this API/GPU combination This message could be due to one of the following reasons:
|
The Range Profiler View has four main parts: the Scrubber, which shows the ranges of actions, and the Range Info, Pipeline Overview, and Memory sections, which show the performance information for the currently selected range in the Scrubber.
On the right side of the header for each section, there is a drop-down menu that allows you to select different views for that section. The Summary highlights important aspects of the selected range, Range Details shows deeper data for the range, and Action Details shows statistics for each action in the selected range.
The top of the Range Profiler View includes a Scrubber, which mirrors the Scrubber View.
The default view will show ranges based on the performance markers you have defined in your application. These can be done via the NVTX library, KHR_debug
, or any other range definition API supported by your graphics API of choice. Clicking the Add... button will open a dialog that allows you to select what type of range you want to add.
When you click on a range on the Scrubber portion, the other sections of the Range Profiler View will automatically update with that selected range's information. You can also click on a single action in the Scrubber to profile only that action.
The Range Info section shows the Summary view by default, which is a draw call summary on the left, and a summary of compute or dispatch calls on the right.
From the drop-down menu on the right, you can select either the Action Details or Timeline view.
The Action Details display will show tables with all of the statistics in the Summary for all of the draw and dispatch calls in a table format.
The Timeline shows a real-time display the actions in the range that was profiled. Hovering your cursor over an action will show its details.
When displaying the Summary, the Pipeline Overview section shows the top four "inefficiently" utilized units in the GPU pipeline. These show you, at a glance, where the most inefficient units are in the pipeline for the currently selected range.
The drop-down menu on the right shows three other views: Range Details, Range Table, and Action Details.
The Range Details of the Pipeline Overview section displays an "inefficiency" percentage for each GPU unit for the currently selected range. The results are laid out in the same way that data flows through the GPU pipeline. It starts from the Input Assembler, where the vertex attributes are gathered, loops multiple times through the shader unit, samples from the texture units, reads from L2 and the impact those have on the memory subsystem, etc. Note that this graph may vary based on the hardware being used, as well as the way your application utilizes the graphics API and GPU.
The Range Table of the Pipeline Overview takes all of the information that was in the Range Details graphic and puts it into a table format, allowing you to drill down and find where the performance of your app can be optimized.
The Action Details display is essentially a raw output of the data you are profiling for each draw call/dispatch call in the currently selected range. The rows list the actions in the selected range, and the columns show the pipeline details for each event. Hovering your cursor over a given cell will show more detailed information for that action.
The Memory section shows a Summary display by default, with L2 utilization information on the left, and frame buffer information on the right. From the drop-down menu, you can select the Range Details or the Range Table display.
The Range Details display of the Memory section displays percentage utilization of the links between the various units that consume the memory subsystem resources. For example, the shader unit makes requests to the texture unit. These requests result in L2 cache queries, which may miss and have to load the data from the GPU memory.
The Range Table display of the Memory section takes all of the information that was in the Range Details graphic and puts it into a table format, allowing you to drill down and find where the performance of your app can be optimized.
Watch the gr__busy_pct
signal on the signal graphs once you launch the application. The larger your gr__busy_pct
percentage, the higher the likelihood you are GPU-bound.
Capture a frame, and then run the Range Profiler. Once the Range Profiler is done running experiments, the entire scene will be selected by default. This will allow you to see details about all of the draw calls and dispatches in the scene. If you select Action Details in the Range Info section, you will see details on each draw call, including the execution time. Sort the table to time to see the most expensive draw call.
In the Pipeline section, select Range Details and you will see an image with a virtual GPU pipeline. The red bars indicate units in the GPU that are not being used as efficiently as they could, so look for the higher bars to indicate where you need to spend time optimizing. (See below for specific tips on optimizing your API inputs for a particular unit).
The Range Profiler contains a powerful grouping capability that allows you make new ranges based on common state. These include ranges based on program/shaders being used, viewport, render targets, and even user ranges that can be declared on the fly.
The scrubber at the top of the Range Profiler View shows all of the performance marker ranges defined by the application, along with the amount of time spent for each one. A good strategy would be to look for ranges with a large amount of time, then drill down to where you see a large amount of time being spent. Once you click on that range, you can look at the Pipeline section for details on how that selected range is utilizing the GPU.
The NVIDIA Nsight Frame Debugger disables VSYNC, so applications that have VSYNC enabled under normal circumstances may see a higher frame rate when the same application is run under the Frame Debugger. NVIDIA Nsight also has a small performance overhead, which may reduce the frame rate slightly.
NVIDIA® Nsight™ Application Development Environment for Heterogeneous Platforms, Visual Studio Edition User Guide Rev. 5.3.170616 ©2009-2017. NVIDIA Corporation. All Rights Reserved.