Is this page helpful?

Overview#

Apache Spark 3.3+ lets users provide a plugin that can replace the backend for SQL and DataFrame operations. This requires no API changes from the user. The plugin will replace SQL operations it supports with GPU accelerated versions. If an operation isn’t supported it will fall back to using the Spark CPU version. The plugin can’t accelerate operations that manipulate RDDs directly.

The accelerator library also provides an implementation of Spark’s shuffle that can leverage UCX to optimize GPU data transfers keeping as much data on the GPU as possible and bypassing the CPU to do GPU to GPU transfers.

The GPU accelerated processing plugin doesn’t require the accelerated shuffle implementation. However, if accelerated SQL processing isn’t enabled, the shuffle implementation falls back to the default SortShuffleManager.

To enable GPU processing acceleration you will need:

Apache Spark 3.3+
A Spark cluster configured with GPUs that comply with the requirements for RAPIDS.
- One GPU per executor.
The cuDF for Apache Spark plugin jar.
To set the config spark.plugins to com.nvidia.spark.SQLPlugin

cuDF for Apache Spark Compatibility Overview#

Kindly refer to the cuDF for Apache Spark download page for information on supported platforms, hardware/software prerequisites, and release notes.

The following compatibility summary mirrors the cuDF for Apache Spark 26.04 release metadata:

OS: cuDF for Apache Spark is compatible with any Linux distribution with glibc >= 2.28 (Please check ldd –version output). glibc 2.28 was released August 1, 2018.
NVIDIA Driver*: R525+
Runtime: Scala 2.12, 2.13
Scala 2.12: Spark 3.3.0 through 3.5.8
Scala 2.13: Spark 3.5.0 through 3.5.8, and Spark 4.0.0, 4.0.1, 4.0.2, and 4.1.1
Databricks 13.3 ML LTS (GPU, Scala 2.12, Spark 3.4.1)
Databricks 14.3 ML LTS (GPU, Scala 2.12, Spark 3.5.0)
Databricks 17.3 ML LTS (GPU, Scala 2.13, Spark 4.0.0)
GCP Dataproc 2.1
GCP Dataproc 2.2
GCP Dataproc 2.3
Spark runtime 1.1 LTS
Spark runtime 1.2
Spark runtime 2.0
Spark runtime 2.1
Spark runtime 2.2
The above packages are built against CUDA 12.9 or CUDA 13.1. They are tested on V100, T4, A10, A100, L4, H100 and GB100 GPUs.

Spark GPU Scheduling Overview#

Apache Spark 3.x now supports GPU scheduling as long as you are using a cluster manager that supports it. You can have Spark request GPUs and assign them to tasks. The exact configs you use will vary depending on your cluster manager. Here are some example configs:

Request your executor to have GPUs:
- --conf spark.executor.resource.gpu.amount=1
Specify the number of GPUs per task:
- --conf spark.task.resource.gpu.amount=0.125 will allow up to eight concurrent tasks per executor. It’s recommended to be 1/{executor core count} to get the best performance.
Specify a GPU discovery script (required on YARN and K8S):
- --conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh
Explain why some operations of a query weren’t placed on a GPU or not:
- --conf spark.rapids.sql.explain=ALL will display whether each operation is placed on GPU.
- --conf spark.rapids.sql.explain=NOT_ON_GPU will display only parts that didn’t go on the GPU, and it’s the default setting.
- --conf spark.rapids.sql.explain=NONE will disable the log of rapids.sql.explain.

Refer to the deployment specific sections for more details and restrictions. spark.task.resource.gpu.amount can be a decimal amount, so if you want multiple tasks to be run on an executor at the same time and assigned to the same GPU you can set this to a decimal value less than 1. You would want this setting to correspond to the spark.executor.cores setting. For instance, if you have spark.executor.cores=2 which would allow 2 tasks to run on each executor and you want those 2 tasks to run on the same GPU then you would set spark.task.resource.gpu.amount=0.5. Refer to the Tuning Guide for more details on controlling the task concurrency for each executor.

You can also refer to the official Apache Spark documentation.

Spark workload qualification#

If you plan to convert existing Spark workload from CPU to GPU, refer to this Spark workload qualification to check if your Spark Applications are good fit for the cuDF for Apache Spark.

Spark benchmark#

Please visit spark-rapids-benchmarks repo for benchmark tests using the cuDF for Apache Spark, if you plan to compare the CPU and GPU Spark jobs’ performance.