SIMD#

NVIDIA DGX Spark implements two vector single-instruction-multiple-data (SIMD) instruction extensions:

  • Advanced SIMD Instructions (NEON)

  • Arm Scalable Vector Extensions (SVE)

Arm Advanced SIMD Instructions (or NEON) is the most common SIMD ISA for Arm64. It is a fixed-length SIMD ISA that supports 128-bit vectors. The first Arm-based supercomputer to appear on the Top500 Supercomputers list (Astra) used NEON to accelerate linear algebra, and many applications and libraries are already taking advantage of NEON.

More recently, Arm64 CPUs have started supporting Arm Scalable Vector Extensions (SVE), which is a length-agnostic SIMD ISA that supports more datatypes than NEON (for example, FP16), offers more powerful instructions (for example, gather/scatter), and supports vector lengths of more than 128 bits. SVE is currently found in NVIDIA DGX Spark, NVIDIA Grace, the AWS Graviton 3, Fujitsu A64FX, and others. SVE is not a new version of NEON, but an entirely new SIMD ISA.

NVIDIA DGX Spark can retire six 128-bit NEON operations or six 128-bit SVE2 operations on P-cores, and two 128-bit NEON operations or two 128-bit SVE2 operations on E-cores. Although the theoretical peak performance of SVE and NEON are the same for these CPUs, SVE (and especially SVE2) is a more capable SIMD ISA with support for complex data types and advanced features that enable the vectorization of complicated code. In practice, kernels that cannot be vectorized in NEON can be vectorized with SVE. So, although SVE will not beat NEON in a performance drag race, it can dramatically improve the overall performance of the application by vectorizing loops that would have otherwise executed with scalar instructions.