How To: Optimize OpenGL ES 2.0 Performance for Tegra


NVIDIA’s Tegra mobile system on a chip (SOC) series include an extremely powerful and flexible 3D GPU with power that is well matched to the OpenGL ES 2.0 APIs. For optimal content rendering, there are some basic guidelines and several tips that can assist developers in reaching their goals. This document will detail these recommendations, as well as a few warnings regarding features and choices that can limit performance in 3D-centric applications.

The 3D GPU in all Tegra series SOCs contains a programmable vertex shading unit and a programmable fragment shading unit, each of which are accessible via OpenGL ES 2.0’s GLSL-ES shading language. Tegra also includes a high-performance multi-core ARM CPU and a high-bandwidth memory controller (MC) to round out the components of 3D rendering.

Optimal performance is achieved by:

  1. Maximizing the efficient use of the fragment shading unit and vertex shading unit via smart shader programming
  2. Minimizing the use of the CPU by avoiding redundant and ill-optimized rendering methods.
  3. Optimizing the use of memory bandwidth across the fragment unit, vertex unit and display systems.

This document will cover aspects of all of these elements. Note that all quoted numbers are relative to clock settings on the Tegra 3 based “Cardhu” development kit. Numbers on other Tegra variants will differ.

Of particular note:

Basic Performance Notes

In real-world applications, the most common performance bottlenecks are:

  1. Fragment fill rate for applications using long shaders and/or lots of overdraw.
  2. Memory bandwidth on devices with large screens or when using large/deep textures without mip-selecting filter modes.
  3. Lack of CPU/GPU parallelism for applications that use redundant or GPU-unfriendly OpenGL ES code.


NVIDIA® GameWorks™ Documentation Rev. 1.0.220830 ©2014-2022. NVIDIA Corporation and affiliates. All Rights Reserved.