You are here:

Geometry

Tegra has a very capable vertex processor, so you shouldn't automatically assume that a large triangle count is the reason for poor performance. However, in order to achieve high throughput a number of resources constraints must be managed.

Vertex Shader Complexity

Typically, 20-30 cycles is reasonable for most assets, but peak throughput is achieved at around 10 cycles per vertex.

Go to the Frame Profiler.
Select the Shader State checkbox only (vertex shader buckets are not fragmented based on other state).
Order the Bucket List by duration. For each significantly contributing bucket, do the following:
- Select the bucket.
- Go to Frame Debugger -> Shader Viewer.
- Select the Vertex Shader tab.
- Examine (in the lower-left view) the stats for the selected shader.

Post-transform Vertex Reuse

Improving Post-transform vertex reuse is the single easiest way to increase geometry throughput on Tegra.

Go to Frame Profiler -> Vertex Cache Hits. This shows an post-transform cache efficiency rating for every draw call in the scene.
Scan the displayed data, and verify that vertex re-use is reasonable. Well-optimized mesh data should show post-transform cache hits in the 70% or so range.

In order to maximize post-transform reuse, it is recommended that you use indexed triangle lists for all geometry. Indexed triangle lists allow easy construction of optimized primitives without the need for degenerate link triangles.

Go to Frame Debugger -> Geometry Viewer.
Scrub through all draw calls, paying attention to:
- Type of the draw call, either non-indexed (DrawArrays) or indexed (DrawElements). DrawElements is recommended.
- Primitive mode, either TRIANGLE_STRIP/TRIANGLE_FAN/TRIANGLES. TRIANGLES is recommended.

Place All Geometry In VBOs

Efficiently organized vertex/index data is key to reducing load on the memory system. It's recommended that you place all geometry in VBOs, and interleave vertex attribute data whenever possible (i.e., use the "array of structures" vertex layout).

Go to Frame Debugger -> Geometry Viewer.
Scrub through all draw calls.
- Any index data NOT placed in a buffer object will be flagged in red: Indices are not in a VBO.
- Any vertex data NOT placed in buffer objects will be flagged in red: Not using VBO.

Dynamically modified VBO's

NOTE: Those are identifiable by dashboard dynamic vertex buffer object indicator. (Status bar, lower right).

Dynamically modified vertex data should also be placed in buffer objects. However, modifying (glMapBufferOES/glBufferData/glBufferSubData) a submitted buffer will stall the CPU until that buffer has been completely processed by the GPU.

Currently, buffer renaming is not supported. Also, it is recommended that you double buffer dynamic VBO handles, and only update the handle not already referenced by the in-flight scene.

Find a completely static scene (so that the call-trace contains identical content each time). Do this, half a dozen or so times:

Go to the DashBoard, and ensure PerfHUD is connected and the timing data is being updated.
Grab a Frame Debugger frame.
Go to Frame Debugger -> Call Trace.
Click Export, and save the call trace out as trace_n.txt.
Load all the trace files into a text editor.
For each buffer update function called in the frame, search for the relevant dynamic buffer update function (glMapBufferOES/glBufferData/glSubBufferData) in one of the trace files.
Identify the buffer objects bound when the update is called.
Examine the same update in the other trace files, and in at least one, the buffer id should be different.

As an example, some of this with can be automated with grep:

grep -b1 glMapBufferOES trace_?.txt

The -b1 option outputs the previous one line before the match (which should capture the prior buffer binding).

Compressed Vertex Attributes

It's recommended that you reduce vertex fetch memory bandwidth as much as possible, and not rely on fp32 for everything. Tegra supports half precision float, as well as signed/unsigned byte/short formats.

Go to Frame Debugger -> Geometry Viewer.
Scrub through all draw calls, paying attention to:
- The type of each attribute. Compressed formats should be used wherever possible.
- Don't pad vertex attribute data unnecessarily. (For example, don't fetch a 4-wide attribute with constant w = 1.0 for every position. Use the GLES2 default values.)

Interleaved Vertex Attributes

The memory controller in a Tegra device is much more efficient when fetching from spatially coherent addresses. Storing vertex attributes as array of structures ensures good spatial locality for attributes making up any particular vertex. This results in an efficient memory access pattern. Not following this guidance can severely impact performance.

Go to Frame Debugger -> Geometry Viewer.
Scrub through all draw calls and verify, by examining the buffer and attribute specs, that:
- All vertex attributes are fetched from the same VBO.
- The stride of the bound VBO (if applicable) should generally be equal to the sum of the size of the attributes referenced, to reduce redundant memory between consumption.
- The individual attributes are interleaved within the VBO; i.e., each attribute has the same stride and a different (typically sub-stride) offset within the buffer.