- Overview
- Quickstart
- Output Details
- Benchmark Environments
- Overview
- On-prem Cluster or Local Mode
- Spark Deployment Methods
- Apache Spark Setup for GPU
- Install Spark
- Download the RAPIDS Accelerator jar
- Install the GPU Discovery Script
- Local Mode
- Spark Standalone Cluster
- Running on YARN
- Running on Kubernetes
- Configuration and Tuning
- Example Join Operation
- Enabling RAPIDS Shuffle Manager
- Advanced Configuration
- Monitoring
- Debugging
- Out of GPU Memory
- AWS EMR
- Leveraging RAPIDS Accelerator User Tools for Qualification and Bootstrap
- Qualify CPU Workloads for GPU Acceleration
- Configure and Launch AWS EMR with GPU Nodes
- Launch an EMR Cluster using AWS Console (GUI)
- Launch an EMR Cluster using AWS CLI
- Running the RAPIDS Accelerator User Tools Bootstrap for Optimal Cluster Spark Settings
- Running an Example Join Operation Using Spark Shell
- Submit Spark jobs to an EMR Cluster Accelerated by GPUs
- Running GPU Accelerated Mortgage ETL Example using EMR Notebook
- Launch an EMR Cluster using AWS Console (GUI)
- Databricks
- GCP Dataproc
- Create a Dataproc Cluster Accelerated by GPUs
- Run Python or Scala Spark Notebook on a Dataproc Cluster Accelerated by GPUs
- Submit Spark jobs to a Dataproc Cluster Accelerated by GPUs
- Diagnosing a GPU Cluster
- Bootstrap GPU Cluster with Optimized Settings
- Qualify CPU Workloads for GPU Acceleration
- Tune Applications on GPU Cluster
- Create a Dataproc Cluster Accelerated by GPUs
- Dataproc Serverless
- Azure Synapse Analytics
- Kubernetes
- Spark Workload Qualification
- Oracle Cloud Infrastructure
- Spark3 GPU Configuration Guide on Yarn 3.2.1
- Tuning Guide
- Best Practices
- Workload Qualification
- Performance Tuning
- How to handle GPU OOM issues
- Reduce the number of concurrent tasks per GPU
- Install CUDA 11.5 or above version
- Identify that SQL, job and stage is involved in the error
- Increase the number of tasks/partitions based on the type of the problematic stage
- Reduce columnar batch size and file reader batch size
- File an issue or ask a question on the GitHub repo
- RAPIDS Accelerator for Apache Spark ML Library Integration
- RAPIDS Shuffle Manager
- Apache Iceberg Support
- Delta Lake Support
- RAPIDS Accelerator File Cache
- Frequently Asked Questions
- What versions of Apache Spark does the RAPIDS Accelerator for Apache Spark support?
- Which distributions are supported?
- What CUDA versions are supported?
- What hardware is supported?
- How can I check if the RAPIDS Accelerator is installed, and which version is running?
- What parts of Apache Spark are accelerated?
Dataset
- What is the road-map like?
- How much faster will my query run?
- What operators are best suited for the GPU?
- Are there initialization costs?
- How long does it take to translate a query to run on the GPU?
- How can I tell what will run on the GPU and what won’t run on it?
- Why does the plan for the GPU query look different from the CPU query?
explain()
spark.rapids.sql.enabled
false
- How are failures handled?
- How does the Spark scheduler decide what to do on the GPU vs the CPU?
- Is Dynamic Partition Pruning (DPP) Supported?
- Is Adaptive Query Execution (AQE) Supported?
- Why does my query show as not on the GPU when Adaptive Query Execution is enabled?
- Does the RAPIDS Shuffle Manager support External Shuffle Service (ESS)?
- Are cache and persist supported?
- Can I cache data into GPU memory?
- Is PySpark supported?
- Are the R APIs for Spark supported?
- Are the Java APIs for Spark supported?
- Are the Scala APIs for Spark supported?
- Is the GPU needed on the driver? Are there any benefits to having a GPU on the driver?
- Are table layout formats supported?
- How many tasks can I run per executor? How many should I run per executor?
spark.executor.cores
spark.task.resource.gpu.amount
spark.rapids.sql.concurrentGpuTasks
- Why are multiple GPUs per executor not supported?
- Why are multiple executors per GPU not supported?
- Is Multi-Instance GPU (MIG) supported?
- How can I run custom expressions/UDFs on the GPU?
- Why is the size of my output Parquet/ORC file different?
Failed to open the timezone file
- Why am I getting an error when trying to use pinned memory?
- Why am I getting a buffer overflow error when using the KryoSerializer?
- Why am I getting “Unable to acquire buffer” or “Trying to free an invalid buffer”?
- Is speculative execution supported?
- Why is my query in GPU mode slower than CPU mode?
- Why is the Avro library not found by RAPIDS?
- What is the default RMM pool allocator?
RetryOOM
SplitAndRetryOOM
- Encryption Support
- Can the RAPIDS Accelerator work with Spark on Ray (RayDP)?
- Why I can not allocate entire Grace Hopper architecture GPU memory ?
- I have more questions, where do I go?
- Qualification Tool - Jar Usage
- Profiling Tool - Jar Usage
- Examples
- Glossary
- Contact Us