Beyond the dashboard, PerfHUD ES grabs, and processes, two distinct sets of data: Frame Debugger and Frame Profiler. These data-sets are complementary, and both are required to take full advantage of PerfHUD ES.
The Frame Debugger data is a complete capture of a single rendered frame (delimited by eglSwapBuffers
by default, or some other user-specified function if necessary). PerfHUD allows the user to scrub through the scene and examine state (geometry, texture, gl state, framebuffer contents, and shaders) for each submitted draw call. This data set does not include any timing information.
The Frame Profiler data contains detailed timing information captured from the running application.
In order to grab the Frame Profiler data, PerfHUD ES needs to repeatedly re-submit the current frame, instrumenting various aspects of the scene each time.
In order for this process to succeed, and for the data captured to match the Frame Debugger data, the application must be capable of entering some state such that it will re-render exactly the same scene every frame.
To facilitate this, we provide an egl extension to present Time Update APIs to the application. If an application uses this extension to drive its internal update, PerfHUD ES can interfere with the time presented to the application to effectively slow down, speed up, or fully pause the application. Generally, this allows PerfHUD ES to gather the myriad timing data-sets needed by the Frame Profiler.
Currently the process of identifying many of these issues involves manually scrubbing through the scene in the Frame-Debugger, while monitoring some appropriate view of the rendered data. Future plans are to automate much of this manual effort, by processing the scene and reporting a summary of status for each item (possibly along with recommendations to address specific problems).
The Bucket Definition view can show you how many unique shaders (programs) that the application uses. Check only the Shader State checkbox to view this list of buckets (in the central view).
Certain render states (for example, alpha-blending) can result in fragment shader recompilation (i.e., each application-provided fragment shader is actually an umbrella for the set of fragment-shaders generated from that particular shader, in combination with render-state). Check the Shader State and Raster State options to view these buckets.
The third option, Framebuffer Object, allows further breaking down of buckets based on the render target to which each draw call was submitted. Be aware, though, that this feature does not handle the case where a single FBO handle is dynamically re-attached to multiple sets of surfaces.
Whichever checkboxes are marked, selecting a bucket in the Bucket-List view displays the list of draw calls contributing to that bucket in the right-hand Draw Call view.
Both buckets and draw calls can be ordered by duration, allowing easy identification of key bottlenecks. Use this view to focus and drive your optimization effort.
For depth buffered scenes, a front-to-back draw order is always the most efficient.
Go to Frame Debugger -> Frame Scrubber.
Scrub through all draw calls in the scene. Verify that, for each rendered frame buffer, draw calls are dispatched in approximately front-to-back order, and that depth buffering is enabled (i.e., the depth buffer is updated as the scene is drawn).
The requirements to efficiently batch and simultaneously submit by increasing depth are conflicted.
Are off-screen/visible frame buffer's sensible sizes?
Don't clear surfaces redundantly. In particular, don't clear the color-buffer if every pixel will be written.
The most reliable way to identify glClear
calls, currently, is the following:
glClear(
. Explicitly disable the frame buffers that are not needed for the active render. For example, do not write stencil/depth, unless the written content is required and meaningful.
Keep track of GLES state in the application and only make the minimum GLES2 function calls required. This helps to reduce the intrusiveness of PerfHUD ES, and also reduces redundant application overhead.
Texture state is cached by the driver (filter mode, wrap mode). It isn't necessary to redundantly set this state for every draw call. This also applies to shader state; uniforms are cached per-shader once they are set.
Realistically, all applications will tend to submit some redundant state, and the overhead for this is negligible. However, the most common patterns of abuse are trivial to identify, and often involve very little engineering effort to resolve.
Alpha blending causes a read-modify write frame buffer access pattern, which reduces memory efficiency and overall fragment throughput. It's recommended that you use alpha-blending sparingly.
Discard (primarily used for alpha-test) disables our (efficient) early depth/stencil buffer write mechanism, reducing memory efficiency and overall fragment throughput. It's recommended that you use discard sparingly.
.glslf
file extension. Backface cull helps to reduces overdraw significantly. It's recommended that you enable backface cull consistently, for as much of the scene as possible.
NVIDIA® GameWorks™ Documentation Rev. 1.0.211026 ©2014-2021. NVIDIA Corporation and affiliates. All Rights Reserved.