Overview#

Added in version 3.1.

NVIDIA AI Enterprise includes the RAPIDS Accelerator for Apache Spark, which leverages GPUs to accelerate processing via the RAPIDS libraries.

The RAPIDS Accelerator for Apache Spark and NVIDIA GPUs make it possible to transparently (no code changes) accelerate Spark data frame workloads. The software provides transparent acceleration of Spark data frame jobs via a plugin that integrates with Spark’s query planner. Operations that cannot be accelerated will continue to run on the CPU with Spark’s built-in implementations.

The RAPIDS Accelerator is delivered as a jar file and intercepts dataframe and SQL operations in order to evaluate accelerable operations with implementations that execute on the GPU.

The Accelerated Spark stack consists of three main components, each playing a role in enabling Spark users to accelerate their ETL or DL or ML application.

Spark 3.0 Core engine: Spark 3.0 core provides two critical capabilities, one GPU scheduling and two columnar processing for RAPIDS Accelerator to execute the Spark operations on the GPU. The plugin supports SQL and dataframe operations, highlighted by the green outline in the spark components layer, which is commonly used for data processing.

Second component is the RAPIDS SW, an open-source collection of libraries aimed to democratize data science on GPUs

NVIDIA GPU Accelerated Infrastructure

NVIDIA AI Enterprise includes support for running the RAPIDS Accelerator for Apache Spark on three leading Spark platforms:

Google Cloud Dataproc

Databricks

Azure

AWS

Amazon EMR

Important areas of benefit when using the RAPIDS Accelerator are:

No Code Changes Required

Transparent GPU acceleration with a plugin that works on all major Apache Spark platforms, including Google Cloud Dataproc, Amazon EMR, and Databricks.

Full Stack Acceleration

Run existing Apache Spark 3.x jobs 5x faster than equivalent CPU-only systems.

Enterprise Support

Mission critical support, bug fixes, and professional services available through NVIDIA AI Enterprise. The RAPIDS Accelerator for Apache Spark with NVIDIA AI Enterprise is licensed by bringing your own license (BYOL). For further details refer to the NVIDIA AI Enterprise Packaging, Pricing, and Licensing Guide.

The rest of this document will outline the steps for getting a GPU Accelerated Cluster up and running for each of the leading Spark platforms. The first section explains how to access the jar file used to accelerate the cluster. The following sections will include platform specific details on how to set up the required prerequisites to access the Spark platform and run with RAPIDS Accelerator (jar) along with steps for spinning up a cluster with the jar file.

Tip

The following sections of this guide provide quick start instructions for implementing the RAPIDS Accelerator for each of the supported Spark Platforms.