Vertex Shader Performance


The Tegra vertex shader unit is extremely powerful and flexible. It is a super-scalar, dual-issue unit consisting of a vector floating point (4-wide) pipeline alongside a scalar floating point "multi-function unit" pipeline (providing common transcendental function implementations; sin, sqrt, log2, etc).

It has full support for conditional operations and looping, and is capable of transforming vertices at a rate of more than 100M per second. It has pre-transform and post-transform cache support systems in order to help achieve high throughput efficiently, without loading the back-end memory system excessively.

Vertex Shader Guidelines and Optimizations

Reduce the memory footprint and optimize the layout of vertex data to minimize load on the memory system by doing the following:

Take full advantage of Tegra’s sophisticated post-transform cache by following these guidelines:

Use vertex buffer objects (VBOs) to store ALL geometry (vertex and index) data:

Avoid updating dynamic VBO content for a buffer already in use by the GPU:

Minimize unnecessary load on the memory system when updating VBO content dynamically:

Character Skinning and the Vertex Unit

Moving character skinning from the CPU to the GPU is a perfect way to offload the CPU and lower memory bandwidth. OpenGL ES 2.0 makes dynamic character skinning possible on the GPU, even if the skinning method does not fit the "basic bone-palette" limitations. Even more complex skinning can be done on the GPU (e.g., bone skinning and morph deformations). By moving all skinning to the GPU, we can also avoid using dynamic vertex buffers, since all of the source data (except matrices) can be static. However, there are a few recommendations for character skinning on the GPU:

 

 


NVIDIA® GameWorks™ Documentation Rev. 1.0.200608 ©2014-2020. NVIDIA Corporation. All Rights Reserved.