RAPIDS Accelerator for Apache Spark Configuration#
The following is the list of options that rapids-plugin-4-spark
supports.
On startup use: --conf [conf key]=[conf value]
. For example:
1${SPARK_HOME}/bin/spark-shell --jars rapids-4-spark_2.12-23.02.0-cuda11.jar \
2--conf spark.plugins=com.nvidia.spark.SQLPlugin \
3--conf spark.rapids.sql.concurrentGpuTasks=2
At runtime use: spark.conf.set("[conf key]", [conf value])
. For example:
scala> spark.conf.set("spark.rapids.sql.concurrentGpuTasks", 2)
All configs can be set on startup, but some configs, especially for shuffle, will not work if they are set at runtime. Please check the column of “Applicable at” to see when the config can be set. “Startup” means only valid on startup, “Runtime” means valid on both startup and runtime.
General Configuration#
Name |
Description |
Default Value |
Applicable at |
---|---|---|---|
spark.rapi ds.alluxio.aut omount.enabled |
Enable the feature of auto mounting the cloud storage to Alluxio. It requires the Alluxio master is the same node of Spark driver node. When it’s true, it requires an environment variable ALLUXIO_HOME be set properly. The default value of ALLUXIO_HOME is “/opt/a lluxio-2.8.0”. You can set it as an environment variable when running a spark-submit or you can use spark.ya rn.appMasterEn v.ALLUXIO_HOME to set it on Yarn. The Alluxio master’s host and port will be read from alluxio.m aster.hostname and allu xio.master.rpc .port(default: 19998) from ALLUXIO_HOME/c onf/alluxio-si te.properties, then replace a cloud path which matches spark .rapids.alluxi o.bucket.regex like “s 3://bar/b.csv” to “alluxio ://0.1.2.3:199 98/bar/b.csv”, and the bucket “s3://bar” will be mounted to “/bar” in Alluxio automatically. |
false |
Runtime |
spark .rapids.alluxi o.bucket.regex |
A regex to decide which bucket should be auto-mounted to Alluxio. E.g. when setting as “^s3 ://bucket.*”, the bucket which starts with “s3://bucket” will be mounted to Alluxio and the path “s3://buc ket-foo/a.csv” will be replaced to “ alluxio://0.1. 2.3:19998/buck et-foo/a.csv”. It’s only valid when setting sp ark.rapids.all uxio.automount .enabled=true. The default value matches all the buckets in “s3://” or “s3a://” scheme. |
^ s3a{0,1}://.* |
Runtime |
spark.rapids. alluxio.large. file.threshold |
The threshold is used to identify whether average size of files is large when reading from S3. If reading large files from S3 and the disks used by Alluxio are slow, directly reading from S3 is better than reading caches from Alluxio, because S3 network bandwidth is faster than local disk. This improvement takes effect when sp ark.rapids.all uxio.slow.disk is enabled. |
67108864 |
Runtime |
spark.r apids.alluxio. pathsToReplace |
List of paths to be replaced with corresponding Alluxio scheme. E.g. when configure is set to “s3://fo o->alluxio://0 .1.2.3:19998/f oo,gs://bar->a lluxio://0.1.2 .3:19998/bar”, it means: “s 3://foo/a.csv” will be replaced to “alluxi o://0.1.2.3:19 998/foo/a.csv” and “g s://bar/b.csv” will be replaced to “alluxio ://0.1.2.3:199 98/bar/b.csv”. To use this config, you have to mount the buckets to Alluxio by yourself. If you set this config, spark.rapi ds.alluxio.aut omount.enabled won’t be valid. |
None |
Startup |
spark.rap ids.alluxio.re placement.algo |
The algorithm used when replacing the UFS path with the Alluxio path. CONVERT_TIME and TASK_TIME are the valid options. CONVERT_TIME indicates that we do it when we convert it to a GPU file read, this has extra overhead of creating an entirely new file index, which requires listing the files and getting all new file info from Alluxio. TASK_TIME replaces the path as late as possible inside of the task. By waiting and replacing it at task time, it just replaces the path without fetching the file information again, this is faster but doesn’t update locality information if that has a bit impact on performance. |
TASK_TIME |
Runtime |
sp ark.rapids.all uxio.slow.disk |
Indicates whether the disks used by Alluxio are slow. If it’s true and reading S3 large files, Rapids Accelerator reads from S3 directly instead of reading from Alluxio caches. Refer to spark.rapids. alluxio.large. file.threshold which defines a threshold that identifying whether files are large. Typically, it’s slow disks if speed is less than 300M/second. If using convert time spark.rapi ds.alluxio.rep lacement.algo, this may not apply to all file types like Delta files |
true |
Runtime |
spark.rapid s.alluxio.user |
Alluxio user is set on the Alluxio client, which is used to mount or get information. By default it should be the user that running the Alluxio processes. The default value is ubuntu. |
ubuntu |
Runtime |
spark.rapid s.cloudSchemes |
Comma separated list of additional URI schemes that are to be considered cloud based filesystems. Schemes already included: abfs, abfss, dbfs, gs, s3, s3a, s3n, wasbs. Cloud based stores generally would be total separate from the executors and likely have a higher I/O read cost. Many times the cloud filesystems also get better throughput when you have multiple readers in parallel. This is used with s park.rapids.sq l.format.parqu et.reader.type |
None |
Runtime |
s park.rapids.gp u.resourceName |
The name of the Spark resource that represents a GPU that you want the plugin to use if using custom resources with Spark. |
gpu |
Startup |
spark.rap ids.memory.gpu .allocFraction |
The fraction of available (free) GPU memory that should be allocated for pooled memory. This must be less than or equal to the maximum limit configured via spark.rapids. memory.gpu.max AllocFraction, and greater than or equal to the minimum limit configured via spark.rapids. memory.gpu.min AllocFraction. |
1.0 |
Startup |
s park.rapids.me mory.gpu.debug |
Provides a log of GPU memory allocations and frees. If set to STDOUT or STDERR the logging will go there. Setting it to NONE disables logging. All other values are reserved for possible future expansion and in the mean time will disable logging. |
NONE |
Startup |
spark.rapi ds.memory.gpu. direct.storage .spill.batchWr iteBuffer.size |
The size of the GPU memory buffer used to batch small buffers when spilling to GDS. Note that this buffer is mapped to the PCI Base Address Register (BAR) space, which may be very limited on some GPUs (e.g. the NVIDIA T4 only has 256 MiB), and it is also used by UCX bounce buffers. |
8388608 |
Startup |
spark.rapi ds.memory.gpu. direct.storage .spill.enabled |
Should
GPUDirect
Storage (GDS)
be used to
spill GPU
memory buffers
directly to
disk. GDS must
be enabled and
the directory
|
false |
Startup |
spark.rapids .memory.gpu.ma xAllocFraction |
The fraction of total GPU memory that limits the maximum size of the RMM pool. The value must be greater than or equal to the setting for spark.rapi ds.memory.gpu. allocFraction. Note that this limit will be reduced by the reserve memory configured in spar k.rapids.memor y.gpu.reserve. |
1.0 |
Startup |
spark.rapids .memory.gpu.mi nAllocFraction |
The fraction of total GPU memory that limits the minimum size of the RMM pool. The value must be less than or equal to the setting for spark.rapi ds.memory.gpu. allocFraction. |
0.25 |
Startup |
spark. rapids.memory. gpu.oomDumpDir |
The path to a local directory where a heap dump will be created if the GPU encounters an unrecoverable out-of-memory (OOM) error. The filename will be of the form: “gp u-oom–.hprof” where is the process ID, and the dumpId is a sequence number to disambiguate multiple heap dumps per process lifecycle |
None |
Startup |
spark.rapids.m emory.gpu.pool |
Select the RMM pooling allocator to use. Valid values are “DEFAULT”, “ARENA”, “ASYNC”, and “NONE”. With “DEFAULT”, the RMM pool allocator is used; with “ARENA”, the RMM arena allocator is used; with “ASYNC”, the new CUDA stream-ordered memory allocator in CUDA 11.2+ is used. If set to “NONE”, pooling is disabled and RMM just passes through to CUDA memory allocation directly. |
ASYNC |
Startup |
spark.rapid s.memory.gpu.p ooling.enabled |
Should RMM act as a pooling allocator for GPU memory, or should it just pass through to CUDA memory allocation directly. DEPRECATED: please use spark.rapids.m emory.gpu.pool instead. |
true |
Startup |
spa rk.rapids.memo ry.gpu.reserve |
The amount of GPU memory that should remain unallocated by RMM and left for system use such as memory needed for kernels and kernel launches. |
671088640 |
Startup |
spark.rapid s.memory.gpu.u nspill.enabled |
When a spilled GPU buffer is needed again, should it be unspilled, or only copied back into GPU memory temporarily. Unspilling may be useful for GPU buffers that are needed frequently, for example, broadcast variables; however, it may also increase GPU memory usage |
false |
Startup |
spark.rapids.m emory.host.pag eablePool.size |
The size of the pageable memory pool in bytes unless otherwise specified. Use 0 to disable the pool. |
1073741824 |
Startup |
spark.rapids. memory.host.sp illStorageSize |
Amount of off-heap host memory to use for buffering spilled GPU data before spilling to local disk. Use -1 to set the amount to the combined size of pinned and pageable memory pools. |
-1 |
Startup |
spark.r apids.memory.p innedPool.size |
The size of the pinned memory pool in bytes unless otherwise specified. Use 0 to disable the pool. |
0 |
Startup |
s park.rapids.py thon.concurren tPythonWorkers |
Set the number of Python worker processes that can execute concurrently per GPU. Python worker processes may temporarily block when the number of concurrent Python worker processes started by the same executor exceeds this amount. Allowing too many concurrent tasks on the same GPU may lead to GPU out of memory errors. >0 means enabled, while <=0 means unlimited |
0 |
Runtime |
sp ark.rapids.pyt hon.memory.gpu .allocFraction |
The fraction of total GPU memory that should be initially allocated for pooled memory for all the Python workers. It supposes to be less than (1 - $(spark.rapids .memory.gpu.al locFraction)), since the executor will share the GPU with its owning Python workers. Half of the rest will be used if not specified |
None |
Runtime |
spark .rapids.python .memory.gpu.ma xAllocFraction |
The fraction of total GPU memory that limits the maximum size of the RMM pool for all the Python workers. It supposes to be less than (1 - $(s park.rapids.me mory.gpu.maxAl locFraction)), since the executor will share the GPU with its owning Python workers. when setting to 0 it means no limit. |
0.0 |
Runtime |
spar k.rapids.pytho n.memory.gpu.p ooling.enabled |
Should RMM in Python workers act as a pooling allocator for GPU memory, or should it just pass through to CUDA memory allocation directly. When not specified, It will honor the value of config ‘spark.rapids .memory.gpu.po oling.enabled’ |
None |
Runtime |
spark.rapids.s huffle.enabled |
Enable or
disable the
RAPIDS Shuffle
Manager at
runtime. The
RAPIDS
Shuffle
Ma
nager
must already
be configured.
When set to
|
true |
Runtime |
spark.rapid s.shuffle.mode |
RAPIDS Shuffle Manager mode. “M ULTITHREADED”: shuffle file writes and reads are parallelized using a thread pool. “UCX”: (requires UCX installation) uses accelerated transports for transferring shuffle blocks. “CACHE_ONLY”: use when running a single executor, for short-circuit cached shuffle (for testing purposes). |
MULTITHREADED |
Startup |
spark.rap ids.shuffle.mu ltiThreaded.ma xBytesInFlight |
The size limit, in bytes, that the RAPIDS shuffle manager configured in “ MULTITHREADED” mode will allow to be deserialized concurrently per task. This is also the maximum amount of memory that will be used per task. This should ideally be at least the same size as the batch size so we don’t have to wait to process a single batch. |
2147483647 |
Startup |
spark.r apids.shuffle. multiThreaded. reader.threads |
The number of threads to use for reading shuffle blocks per executor in the RAPIDS shuffle manager configured in “ MULTITHREADED” mode. There are two special values: 0 = feature is disabled, falls back to Spark built-in shuffle reader; 1 = our implementation of Spark’s built-in shuffle reader with extra metrics. |
20 |
Startup |
spark.r apids.shuffle. multiThreaded. writer.threads |
The number of threads to use for writing shuffle blocks per executor in the RAPIDS shuffle manager configured in “ MULTITHREADED” mode. There are two special values: 0 = feature is disabled, falls back to Spark built-in shuffle writer; 1 = our implementation of Spark’s built-in shuffle writer with extra metrics. |
20 |
Startup |
spark.rapids. shuffle.transp ort.earlyStart |
Enable early connection establishment for RAPIDS Shuffle |
true |
Startup |
spa rk.rapids.shuf fle.transport. earlyStart.hea rtbeatInterval |
Shuffle early start heartbeat interval ( milliseconds). Executors will send a heartbeat RPC message to the driver at this interval |
5000 |
Startup |
sp ark.rapids.shu ffle.transport .earlyStart.he artbeatTimeout |
Shuffle early start heartbeat timeout ( milliseconds). Executors that don’t heartbeat within this timeout will be considered stale. This timeout must be higher than the value for spa rk.rapids.shuf fle.transport. earlyStart.hea rtbeatInterval |
10000 |
Startup |
spark.rapids .shuffle.trans port.maxReceiv eInflightBytes |
Maximum aggregate amount of bytes that be fetched at any given time from peers during shuffle |
1073741824 |
Startup |
spark.r apids.shuffle. ucx.activeMess ages.forceRndv |
Set to true to force ‘rndv’ mode for all UCX Active Messages. This should only be required with UCX 1.10.x. UCX 1.11.x deployments should set to false. |
false |
Startup |
spa rk.rapids.shuf fle.ucx.manage mentServerHost |
The host to be used to start the management server |
null |
Startup |
spark. rapids.shuffle .ucx.useWakeup |
When set to true, use UCX’s event-based progress (epoll) in order to wake up the progress thread when needed, instead of a hot loop. |
true |
Startup |
spa rk.rapids.sql. batchSizeBytes |
Set the target number of bytes for a GPU batch. Splits sizes for input data is covered by separate configs. The maximum setting is 2 GB to avoid exceeding the cudf row count limit of a column. |
2147483647 |
Runtime |
s park.rapids.sq l.castDecimalT oFloat.enabled |
Casting from decimal to floating point types on the GPU returns results that have tiny difference compared to results returned from CPU. |
true |
Runtime |
sp ark.rapids.sql .castDecimalTo String.enabled |
When set to true, casting from decimal to string is supported on the GPU. The GPU does NOT produce exact same string as spark produces, but producing strings which are semantically equal. For instance, given input B igDecimal(123, -2), the GPU produces “12300”, which spark produces “1.23E+4”. |
false |
Runtime |
s park.rapids.sq l.castFloatToD ecimal.enabled |
Casting from floating point types to decimal on the GPU returns results that have tiny difference compared to results returned from CPU. |
true |
Runtime |
spark.r apids.sql.cast FloatToIntegra lTypes.enabled |
Casting from floating point types to integral types on the GPU supports a slightly different range of values when using Spark 3.1.0 or later. Refer to the CAST documentation for more details. |
true |
Runtime |
spark.rapids.s ql.castFloatTo String.enabled |
Casting from floating point types to string on the GPU returns results that have a different precision than the default results of Spark. |
true |
Runtime |
spark.rapids.s ql.castStringT oFloat.enabled |
When set to true, enables casting from strings to float types (float, double) on the GPU. Currently hex values aren’t supported on the GPU. Also note that casting from string to float types on the GPU returns incorrect results when the string represents any number “1.7976931 348623158E308” <= x < “1.7976931 348623159E308” and “-1.7976931 348623158E308” >= x > “-1.7976931 348623159E308” in both these cases the GPU returns D ouble.MaxValue while CPU returns “+Infinity” and “-Infinity” respectively |
true |
Runtime |
spar k.rapids.sql.c astStringToTim estamp.enabled |
When set to true, casting from string to timestamp is supported on the GPU. The GPU only supports a subset of formats when casting strings to timestamps. Refer to the CAST documentation for more details. |
false |
Runtime |
spark.rapi ds.sql.coalesc ing.reader.num FilterParallel |
This controls the number of files the coalescing reader will run in each thread when it filters blocks for reading. If this value is greater than zero the files will be filtered in a multithreaded manner where each thread filters the number of files set by this config. If this is set to zero the files are filtered serially. This uses the same thread pool as the multithreaded reader, see spar k.rapids.sql.m ultiThreadedRe ad.numThreads. Note that filtering multithreaded is useful with Alluxio. |
0 |
Runtime |
spark.r apids.sql.conc urrentGpuTasks |
Set the number of tasks that can execute concurrently per GPU. Tasks may temporarily block when the number of concurrent tasks in the executor exceeds this amount. Allowing too many concurrent tasks on the same GPU may lead to GPU out of memory errors. |
1 |
Runtime |
spark.rap ids.sql.concur rentWriterPart itionFlushSize |
The flush size
of the
concurrent
writer cache
in bytes for
each
partition. If
specified
spark.sql.max
ConcurrentOutp
utFileWriters,
use concurrent
writer to
write data.
Concurrent
writer first
caches data
for each
partition and
begins to
flush the data
if it finds
one partition
with a size
that is
greater than
or equal to
this config.
The default
value is 0,
which will try
to select a
size based off
of file type
specific
configs. E.g.:
It uses
|
0 |
Runtime |
spark.rapids. sql.csv.read.d ecimal.enabled |
CSV reading is not 100% compatible when reading decimals. |
false |
Runtime |
spark.rapids .sql.csv.read. double.enabled |
CSV reading is not 100% compatible when reading doubles. |
true |
Runtime |
spark.rapid s.sql.csv.read .float.enabled |
CSV reading is not 100% compatible when reading floats. |
true |
Runtime |
spark.rapids.s ql.decimalOver flowGuarantees |
FOR TESTING ONLY. DO NOT USE IN PRODUCTION. Please see the decimal section of the compatibility documents for more information on this config. |
true |
Runtime |
spa rk.rapids.sql. detectDeltaChe ckpointQueries |
Queries against Delta Lake _delta_log checkpoint Parquet files are not efficient on the GPU. When this option is enabled, the plugin will attempt to detect these queries and fall back to the CPU. |
true |
Runtime |
spark.rapi ds.sql.detectD eltaLogQueries |
Queries against Delta Lake _delta_log JSON files are not efficient on the GPU. When this option is enabled, the plugin will attempt to detect these queries and fall back to the CPU. |
true |
Runtime |
spark.rapi ds.sql.enabled |
Enable (true) or disable (false) sql operations on the GPU |
true |
Runtime |
spark.rapi ds.sql.explain |
Explain why some parts of a query were not placed on a GPU or not. Possible values are ALL: print everything, NONE: print nothing, NOT_ON_GPU: print only parts of a query that did not go on the GPU |
NOT_ON_GPU |
Runtime |
spark.rapids.s ql.fast.sample |
Option to turn on fast sample. If enable it is inconsistent with CPU sample because of GPU sample algorithm is inconsistent with CPU. |
false |
Runtime |
spark.ra pids.sql.forma t.avro.enabled |
When set to true enables all avro input and output acceleration. (only input is currently supported anyways) |
false |
Runtime |
spark.rapi ds.sql.format. avro.multiThre adedRead.maxNu mFilesParallel |
A limit on the maximum number of files per task processed in parallel on the CPU side before the file is sent to the GPU. This affects the amount of host memory used when reading the files in parallel. Used with MULTITHREADED reader, see spark.rapids. sql.format.avr o.reader.type. |
2147483647 |
Runtime |
s park.rapids.sq l.format.avro. multiThreadedR ead.numThreads |
The maximum number of threads, on one executor, to use for reading small Avro files in parallel. This can not be changed at runtime after the executor has started. Used with MULTITHREADED reader, see spark.rapids. sql.format.avr o.reader.type. DEPRECATED: use spa rk.rapids.sql. multiThreadedR ead.numThreads |
None |
Startup |
spark.rapids. sql.format.avr o.read.enabled |
When set to true enables avro input acceleration |
false |
Runtime |
spark.rapids .sql.format.av ro.reader.type |
Sets the Avro reader type. We support different types that are optimized for different environments. The original Spark style reader can be selected by setting this to PERFILE which individually reads and copies files to the GPU. Loading many small files individually has high overhead, and using either COALESCING or MULTITHREADED is recommended instead. The COALESCING reader is good when using a local file system where the executors are on the same nodes or close to the nodes the data is being read on. This reader coalesces all the files assigned to a task into a single host buffer before sending it down to the GPU. It copies blocks from a single file into a host buffer in separate threads in parallel, see spar k.rapids.sql.m ultiThreadedRe ad.numThreads. MULTITHREADED is good for cloud environments where you are reading from a blobstore that is totally separate and likely has a higher I/O read cost. Many times the cloud environments also get better throughput when you have multiple readers in parallel. This reader uses multiple threads to read each file in parallel and each file is sent to the GPU separately. This allows the CPU to keep reading while GPU is also doing work. See spa rk.rapids.sql. multiThreadedR ead.numThreads and spark.rapi ds.sql.format. avro.multiThre adedRead.maxNu mFilesParallel to control the number of threads and amount of memory used. By default this is set to AUTO so we select the reader we think is best. This will either be the COALESCING or the MULTITHREADED based on whether we think the file is in the cloud. See spark.rapids .cloudSchemes. |
AUTO |
Runtime |
spark.r apids.sql.form at.csv.enabled |
When set to false disables all csv input and output acceleration. (only input is currently supported anyways) |
true |
Runtime |
spark.rapids .sql.format.cs v.read.enabled |
When set to false disables csv input acceleration |
true |
Runtime |
s park.rapids.sq l.format.delta .write.enabled |
When set to false disables Delta Lake output acceleration. |
true |
Runtime |
spark.rapids. sql.format.hiv e.text.enabled |
When set to false disables Hive text table acceleration |
true |
Runtime |
spark.rapids .sql.format.hi ve.text.read.d ecimal.enabled |
Hive text file reading is not 100% compatible when reading decimals. Hive has more limitations on what is valid compared to the GPU implementation in some corner cases. See https:// github.com/NVI DIA/spark-rapi ds/issues/7246 |
true |
Runtime |
spark.rapid s.sql.format.h ive.text.read. double.enabled |
Hive text file reading is not 100% compatible when reading doubles. |
true |
Runtime |
spar k.rapids.sql.f ormat.hive.tex t.read.enabled |
When set to false disables Hive text table read acceleration |
true |
Runtime |
spark.rapi ds.sql.format. hive.text.read .float.enabled |
Hive text file reading is not 100% compatible when reading floats. |
true |
Runtime |
spark .rapids.sql.fo rmat.hive.text .write.enabled |
When set to false disables Hive text table write acceleration |
false |
Runtime |
spark.rapid s.sql.format.i ceberg.enabled |
When set to false disables all Iceberg acceleration |
true |
Runtime |
sp ark.rapids.sql .format.iceber g.read.enabled |
When set to false disables Iceberg input acceleration |
true |
Runtime |
spark.ra pids.sql.forma t.json.enabled |
When set to true enables all json input and output acceleration. (only input is currently supported anyways) |
false |
Runtime |
spark.rapids. sql.format.jso n.read.enabled |
When set to true enables json input acceleration |
false |
Runtime |
spark.r apids.sql.form at.orc.enabled |
When set to false disables all orc input and output acceleration |
true |
Runtime |
spark.rapid s.sql.format.o rc.floatTypesT oString.enable |
When reading
an ORC file,
the source
data
s
chemas(schemas
of ORC file)
may differ
from the
target schemas
(schemas of
the reader),
we need to
handle the
castings from
source type to
target type.
Since
float/double
numbers in GPU
have different
precision with
CPU, when
casting
float/double
to string, the
result of GPU
is different
from result of
CPU spark. Its
default value
is |
true |
Runtime |
spark.rap ids.sql.format .orc.multiThre adedRead.maxNu mFilesParallel |
A limit on the maximum number of files per task processed in parallel on the CPU side before the file is sent to the GPU. This affects the amount of host memory used when reading the files in parallel. Used with MULTITHREADED reader, see spark.rapids .sql.format.or c.reader.type. |
2147483647 |
Runtime |
spark.rapids.s ql.format.orc. multiThreadedR ead.numThreads |
The maximum number of threads, on the executor, to use for reading small ORC files in parallel. This can not be changed at runtime after the executor has started. Used with MULTITHREADED reader, see spark.rapids .sql.format.or c.reader.type. DEPRECATED: use spa rk.rapids.sql. multiThreadedR ead.numThreads |
None |
Startup |
spark.rapids .sql.format.or c.read.enabled |
When set to false disables orc input acceleration |
true |
Runtime |
spark.rapid s.sql.format.o rc.reader.type |
Sets the ORC reader type. We support different types that are optimized for different environments. The original Spark style reader can be selected by setting this to PERFILE which individually reads and copies files to the GPU. Loading many small files individually has high overhead, and using either COALESCING or MULTITHREADED is recommended instead. The COALESCING reader is good when using a local file system where the executors are on the same nodes or close to the nodes the data is being read on. This reader coalesces all the files assigned to a task into a single host buffer before sending it down to the GPU. It copies blocks from a single file into a host buffer in separate threads in parallel, see spar k.rapids.sql.m ultiThreadedRe ad.numThreads. MULTITHREADED is good for cloud environments where you are reading from a blobstore that is totally separate and likely has a higher I/O read cost. Many times the cloud environments also get better throughput when you have multiple readers in parallel. This reader uses multiple threads to read each file in parallel and each file is sent to the GPU separately. This allows the CPU to keep reading while GPU is also doing work. See spa rk.rapids.sql. multiThreadedR ead.numThreads and spark.rap ids.sql.format .orc.multiThre adedRead.maxNu mFilesParallel to control the number of threads and amount of memory used. By default this is set to AUTO so we select the reader we think is best. This will either be the COALESCING or the MULTITHREADED based on whether we think the file is in the cloud. See spark.rapids .cloudSchemes. |
AUTO |
Runtime |
spark.rapids. sql.format.orc .write.enabled |
When set to false disables orc output acceleration |
true |
Runtime |
spark.rapid s.sql.format.p arquet.enabled |
When set to false disables all parquet input and output acceleration |
true |
Runtime |
spark.rapids. sql.format.par quet.multiThre adedRead.maxNu mFilesParallel |
A limit on the maximum number of files per task processed in parallel on the CPU side before the file is sent to the GPU. This affects the amount of host memory used when reading the files in parallel. Used with MULTITHREADED reader, see sp ark.rapids.sql .format.parque t.reader.type. |
2147483647 |
Runtime |
spar k.rapids.sql.f ormat.parquet. multiThreadedR ead.numThreads |
The maximum number of threads, on the executor, to use for reading small Parquet files in parallel. This can not be changed at runtime after the executor has started. Used with COALESCING and MULTITHREADED reader, see sp ark.rapids.sql .format.parque t.reader.type. DEPRECATED: use spa rk.rapids.sql. multiThreadedR ead.numThreads |
None |
Startup |
spark.r apids.sql.form at.parquet.mul tithreaded.com bine.sizeBytes |
The target size in bytes to combine multiple small files together when using the MULTITHREADED parquet reader. With combine disabled, the MULTITHREADED reader reads the files in parallel and sends individual files down to the GPU, but that can be inefficient for small files. When combine is enabled, files that are ready within spark. rapids.sql.for mat.parquet.mu ltithreaded.co mbine.waitTime together, up to this threshold size, are combined before sending down to GPU. This can be disabled by setting it to 0. Note that combine also will not go over the spark.rap ids.sql.reader .batchSizeRows or spark.rapi ds.sql.reader. batchSizeBytes limits. |
67108864 |
Runtime |
spark. rapids.sql.for mat.parquet.mu ltithreaded.co mbine.waitTime |
When using the multithreaded parquet reader with combine mode, how long to wait, in milliseconds, for more files to finish if haven’t met the size threshold. Note that this will wait this amount of time from when the last file was available, so total wait time could be larger then this. |
200 |
Runtime |
spar k.rapids.sql.f ormat.parquet. multithreaded. read.keepOrder |
When using the MULTITHREADED reader, if this is set to true we read the files in the same order Spark does, otherwise the order may not be the same. |
true |
Runtime |
sp ark.rapids.sql .format.parque t.read.enabled |
When set to false disables parquet input acceleration |
true |
Runtime |
spark.ra pids.sql.forma t.parquet.read er.footer.type |
In some cases reading the footer of the file is very expensive. Typically this happens when there are a large number of columns and relatively few of them are being read on a large number of files. This provides the ability to use a different path to parse and filter the footer. AUTO is the default and decides which path to take using a heuristic. JAVA follows closely with what Apache Spark does. NATIVE will parse and filter the footer using C++. |
AUTO |
Runtime |
s park.rapids.sq l.format.parqu et.reader.type |
Sets the Parquet reader type. We support different types that are optimized for different environments. The original Spark style reader can be selected by setting this to PERFILE which individually reads and copies files to the GPU. Loading many small files individually has high overhead, and using either COALESCING or MULTITHREADED is recommended instead. The COALESCING reader is good when using a local file system where the executors are on the same nodes or close to the nodes the data is being read on. This reader coalesces all the files assigned to a task into a single host buffer before sending it down to the GPU. It copies blocks from a single file into a host buffer in separate threads in parallel, see spar k.rapids.sql.m ultiThreadedRe ad.numThreads. MULTITHREADED is good for cloud environments where you are reading from a blobstore that is totally separate and likely has a higher I/O read cost. Many times the cloud environments also get better throughput when you have multiple readers in parallel. This reader uses multiple threads to read each file in parallel and each file is sent to the GPU separately. This allows the CPU to keep reading while GPU is also doing work. See spa rk.rapids.sql. multiThreadedR ead.numThreads and spark.rapids. sql.format.par quet.multiThre adedRead.maxNu mFilesParallel to control the number of threads and amount of memory used. By default this is set to AUTO so we select the reader we think is best. This will either be the COALESCING or the MULTITHREADED based on whether we think the file is in the cloud. See spark.rapids .cloudSchemes. |
AUTO |
Runtime |
spa rk.rapids.sql. format.parquet .write.enabled |
When set to false disables parquet output acceleration |
true |
Runtime |
spark.rapi ds.sql.format. parquet.writer .int96.enabled |
When set to false, disables accelerated parquet write if the spark.sql .parquet.outpu tTimestampType is set to INT96 |
true |
Runtime |
spark.rapi ds.sql.hasExte ndedYearValues |
Spark 3.2.0+ extended parsing of years in dates and timestamps to support the full range of possible values. Prior to this it was limited to a positive 4 digit year. The Accelerator does not support the extended range yet. This config indicates if your data includes this extended range or not, or if you don’t care about getting the correct values on values with the extended range. |
true |
Runtime |
spark.rapids. sql.hashOptimi zeSort.enabled |
Whether sorts should be inserted after some hashed operations to improve output ordering. This can improve output file sizes when saving to columnar formats. |
false |
Runtime |
spark.rapids. sql.improvedFl oatOps.enabled |
For some floating point operations spark uses one way to compute the value and the underlying cudf implementation can use an improved algorithm. In some cases this can result in cudf producing an answer when spark overflows. |
true |
Runtime |
spark.rapids .sql.improvedT imeOps.enabled |
When set to true, some operators will avoid overflowing by converting epoch days directly to seconds without first converting to microseconds |
false |
Runtime |
spark. rapids.sql.inc ompatibleDateF ormats.enabled |
When parsing strings as dates and timestamps in functions like u nix_timestamp, some formats are fully supported on the GPU and some are unsupported and will fall back to the CPU. Some formats behave differently on the GPU than the CPU. Spark on the CPU interprets date formats with unsupported trailing characters as nulls, while Spark on the GPU will parse the date with invalid trailing characters. More detail can be found at parsing strings as dates or times tamps. |
false |
Runtime |
spark.rapids .sql.incompati bleOps.enabled |
For operations that work, but are not 100% compatible with the Spark equivalent set if they should be enabled by default or disabled by default. |
true |
Runtime |
spark.r apids.sql.join .cross.enabled |
When set to true cross joins are enabled on the GPU |
true |
Runtime |
spark.rapid s.sql.join.exi stence.enabled |
When set to true existence joins are enabled on the GPU |
true |
Runtime |
spark.rapid s.sql.join.ful lOuter.enabled |
When set to true full outer joins are enabled on the GPU |
true |
Runtime |
spark.r apids.sql.join .inner.enabled |
When set to true inner joins are enabled on the GPU |
true |
Runtime |
spark.rapi ds.sql.join.le ftAnti.enabled |
When set to true left anti joins are enabled on the GPU |
true |
Runtime |
spark.rapid s.sql.join.lef tOuter.enabled |
When set to true left outer joins are enabled on the GPU |
true |
Runtime |
spark.rapi ds.sql.join.le ftSemi.enabled |
When set to true left semi joins are enabled on the GPU |
true |
Runtime |
spark.rapids .sql.join.righ tOuter.enabled |
When set to true right outer joins are enabled on the GPU |
true |
Runtime |
spark.rapids.s ql.json.read.d ecimal.enabled |
JSON reading is not 100% compatible when reading decimals. |
false |
Runtime |
spark.rapids. sql.json.read. double.enabled |
JSON reading is not 100% compatible when reading doubles. |
true |
Runtime |
spark.rapids .sql.json.read .float.enabled |
JSON reading is not 100% compatible when reading floats. |
true |
Runtime |
sp ark.rapids.sql .metrics.level |
GPU plans can produce a lot more metrics than CPU plans do. In very large queries this can sometimes result in going over the max result size limit for the driver. Supported values include DEBUG which will enable all metrics supported and typically only needs to be enabled when debugging the plugin. MODERATE which should output enough metrics to understand how long each part of the query is taking and how much data is going to each part of the query. ESSENTIAL which disables most metrics except those Apache Spark CPU plans will also report or their equivalents. |
MODERATE |
Runtime |
spark.r apids.sql.mode |
Set the mode for the Rapids Accelerator. The supported modes are explainOnly and executeOnGPU. This config can not be changed at runtime, you must restart the application for it to take affect. The default mode is executeOnGPU, which means the RAPIDS Accelerator plugin convert the Spark operations and execute them on the GPU when possible. The explainOnly mode allows running queries on the CPU and the RAPIDS Accelerator will evaluate the queries as if it was going to run on the GPU. The explanations of what would have run on the GPU and why are output in log messages. When using explainOnly mode, the default explain output is ALL, this can be changed by setting spark.rapid s.sql.explain. See that config for more details. |
executeongpu |
Startup |
spa rk.rapids.sql. multiThreadedR ead.numThreads |
The maximum
number of
threads on
each executor
to use for
reading small
files in
parallel. This
can not be
changed at
runtime after
the executor
has started.
Used with
COALESCING and
MULTITHREADED
readers, see
sp
ark.rapids.sql
.format.parque
t.reader.type,
spark.rapids
.sql.format.or
c.reader.type,
or
spark.rapids
.sql.format.av
ro.reader.type
for a
discussion of
reader types.
If it is not
set explicitly
and
spark.
executor.cores
is set, it
will be tried
to assign
value of
|
20 |
Startup |
spark.r apids.sql.pyth on.gpu.enabled |
This is an experimental feature and is likely to change in the future. Enable (true) or disable (false) support for scheduling Python Pandas UDFs with GPU resources. When enabled, pandas UDFs are assumed to share the same GPU that the RAPIDs accelerator uses and will honor the python GPU configs |
false |
Runtime |
spark.rapi ds.sql.reader. batchSizeBytes |
Soft limit on the maximum number of bytes the reader reads per batch. The readers will read chunks of data until this limit is met or exceeded. Note that the reader may estimate the number of bytes that will be used on the GPU in some cases based on the schema and number of rows in each batch. |
2147483647 |
Runtime |
spark.rap ids.sql.reader .batchSizeRows |
Soft limit on the maximum number of rows the reader will read per batch. The orc and parquet readers will read row groups until this limit is met or exceeded. The limit is respected by the csv reader. |
2147483647 |
Runtime |
spa rk.rapids.sql. reader.chunked |
Enable a chunked reader where possible. A chunked reader allows reading highly compressed data that could not be read otherwise, but at the expense of more GPU memory, and in some cases more GPU computation. |
true |
Runtime |
spa rk.rapids.sql. regexp.enabled |
Specifies whether supported regular expressions will be evaluated on the GPU. Unsupported expressions will fall back to CPU. However, there are some known edge cases that will still execute on GPU and produce incorrect results and these are documented in the compatibility guide. Setting this config to false will make all regular expressions run on the CPU instead. |
true |
Runtime |
s park.rapids.sq l.regexp.maxSt ateMemoryBytes |
Specifies the maximum memory on GPU to be used for regular e xpressions.The memory usage is an estimate based on an upper-bound approximation on the complexity of the regular expression. Note that the actual memory usage may still be higher than this estimate depending on the number of rows in the datacolumn and the input strings themselves. It is recommended to not set this to more than 3 times spa rk.rapids.sql. batchSizeBytes |
2147483647 |
Runtime |
spa rk.rapids.sql. replaceSortMer geJoin.enabled |
Allow replacing sortMergeJoin with HashJoin |
true |
Runtime |
spark.ra pids.sql.rowBa sedUDF.enabled |
When set to true, optimizes a row-based UDF in a GPU operation by transferring only the data it needs between GPU and CPU inside a query operation, instead of falling this operation back to CPU. This is an experimental feature, and this config might be removed in the future. |
false |
Runtime |
spark.rap ids.sql.shuffl e.spillThreads |
Number of threads used to spill shuffle data to disk in the background. |
6 |
Runtime |
spark.r apids.sql.stab leSort.enabled |
Enable or disable stable sorting. Apache Spark’s sorting is typically a stable sort, but sort stability cannot be guaranteed in distributed work loads because the order in which upstream data arrives to a task is not guaranteed. Sort stability then only matters when reading and sorting data from a file using a single t ask/partition. Because of limitations in the plugin when you enable stable sorting all of the data for a single task will be combined into a single batch before sorting. This currently disables spilling from GPU memory if the data size is too large. |
false |
Runtime |
spark.rapids .sql.suppressP lanningFailure |
Option to fallback an individual query to CPU if an unexpected condition prevents the query plan from being converted to a GPU-enabled one. Note this is different from a normal CPU fallback for a yet-t o-be-supported Spark SQL feature. If this happens the error should be reported and investigated as a GitHub issue. |
false |
Runtime |
spark.ra pids.sql.udfCo mpiler.enabled |
When set to true, Scala UDFs will be considered for compilation as Catalyst expressions |
false |
Runtime |
spark.rapids. sql.variableFl oatAgg.enabled |
Spark assumes that all operations produce the exact same result each time. This is not true for some floating point aggregations, which can produce slightly different results on the GPU as the aggregation is done in parallel. This can enable those operations if you know the query is only computing it once. |
true |
Runtime |
spark.rapids.s ql.window.rang e.byte.enabled |
When the order-by column of a range based window is byte type and the range boundary calculated for a value has overflow, CPU and GPU will get the different results. When set to false disables the range window acceleration for the byte type order-by column |
false |
Runtime |
spa rk.rapids.sql. window.range.d ecimal.enabled |
When set to false, this disables the range window acceleration for the DECIMAL type order-by column |
true |
Runtime |
spark.rapids. sql.window.ran ge.int.enabled |
When the order-by column of a range based window is int type and the range boundary calculated for a value has overflow, CPU and GPU will get the different results. When set to false disables the range window acceleration for the int type order-by column |
true |
Runtime |
spark.rapids.s ql.window.rang e.long.enabled |
When the order-by column of a range based window is long type and the range boundary calculated for a value has overflow, CPU and GPU will get the different results. When set to false disables the range window acceleration for the long type order-by column |
true |
Runtime |
s park.rapids.sq l.window.range .short.enabled |
When the order-by column of a range based window is short type and the range boundary calculated for a value has overflow, CPU and GPU will get the different results. When set to false disables the range window acceleration for the short type order-by column |
false |
Runtime |
Supported GPU Operators and Fine Tuning#
The RAPIDS Accelerator for Apache Spark can be configured to enable or disable specific GPU accelerated expressions. Enabled expressions are candidates for GPU execution. If the expression is configured as disabled, the accelerator plugin will not attempt replacement, and it will run on the CPU.
Please leverage the spark.rapids.sql.explain
setting to get feedback from the plugin as to why parts of a query may not be executing on the GPU.
Note
Setting spark.rapids.sql.incompatibleOps.enabled=true
will enable all the settings in the table below which are not enabled by default due to incompatibilities.
Expressions#
Name |
SQL Function(s) |
Description |
Default Value |
Notes |
---|---|---|---|---|
spark.rap ids.sql.exp ression.Abs |
|
Absolute value |
true |
None |
spark.rapi ds.sql.expr ession.Acos |
|
Inverse cosine |
true |
None |
spark.rapid s.sql.expre ssion.Acosh |
|
Inverse hyperbolic cosine |
true |
None |
spark.rap ids.sql.exp ression.Add |
|
Addition |
true |
None |
spark.rapid s.sql.expre ssion.Alias |
Gives a column a name |
true |
None |
|
spark.rap ids.sql.exp ression.And |
|
Logical AND |
true |
None |
spa rk.rapids.s ql.expressi on.AnsiCast |
Convert a column of one type of data into another type |
true |
None |
|
spark.ra pids.sql.ex pression.Ar rayContains |
|
Returns a boolean if the array contains the passed in key |
true |
None |
spark. rapids.sql. expression. ArrayExcept |
|
Returns an array of the elements in array1 but not in array2, without duplicates |
true |
This is not 100% compatible with the Spark version because the GPU imp lementation treats -0.0 and 0.0 as equal, but the CPU imp lementation currently does not (see SP ARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with co mpatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ |
spark. rapids.sql. expression. ArrayExists |
|
Return true if any element satisfies the predicate Lam bdaFunction |
true |
None |
spark.rap ids.sql.exp ression.Arr ayIntersect |
|
Returns an array of the elements in the i ntersection of array1 and array2, without duplicates |
true |
This is not 100% compatible with the Spark version because the GPU imp lementation treats -0.0 and 0.0 as equal, but the CPU imp lementation currently does not (see SP ARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with co mpatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ |
spa rk.rapids.s ql.expressi on.ArrayMax |
`` array_max`` |
Returns the maximum value in the array |
true |
None |
spa rk.rapids.s ql.expressi on.ArrayMin |
`` array_min`` |
Returns the minimum value in the array |
true |
None |
spark. rapids.sql. expression. ArrayRemove |
|
Returns the array after removing all elements that equal to the input element (right) from the input array (left) |
true |
None |
spark. rapids.sql. expression. ArrayRepeat |
|
Returns the array containing the given input value (left) count (right) times |
true |
None |
spark.rap ids.sql.exp ression.Arr ayTransform |
`` transform`` |
Transform
elements in
an array
using the
transform
function.
This is
similar to
a |
true |
None |
spark .rapids.sql .expression .ArrayUnion |
|
Returns an array of the elements in the union of array1 and array2, without duplicates. |
true |
This is not 100% compatible with the Spark version because the GPU imp lementation treats -0.0 and 0.0 as equal, but the CPU imp lementation currently does not (see SP ARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with co mpatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ |
spark.ra pids.sql.ex pression.Ar raysOverlap |
|
Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise. |
true |
This is not 100% compatible with the Spark version because the GPU imp lementation treats -0.0 and 0.0 as equal, but the CPU imp lementation currently does not (see SP ARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with co mpatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ |
spar k.rapids.sq l.expressio n.ArraysZip |
|
Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. |
true |
None |
spark.rapi ds.sql.expr ession.Asin |
|
Inverse sine |
true |
None |
spark.rapid s.sql.expre ssion.Asinh |
|
Inverse hyperbolic sine |
true |
None |
spark.rapid s.sql.expre ssion.AtLea stNNonNulls |
Checks if number of non null/Nan values is greater than a given value |
true |
None |
|
spark.rapi ds.sql.expr ession.Atan |
|
Inverse tangent |
true |
None |
spark.rapid s.sql.expre ssion.Atanh |
|
Inverse hyperbolic tangent |
true |
None |
sp ark.rapids. sql.express ion.Attribu teReference |
References an input column |
true |
None |
|
s park.rapids .sql.expres sion.BRound |
|
Round an expression to d decimal places using HALF_EVEN rounding mode |
true |
None |
spar k.rapids.sq l.expressio n.BitLength |
|
The bit length of string data |
true |
None |
spark .rapids.sql .expression .BitwiseAnd |
|
Returns the bitwise AND of the operands |
true |
None |
spark .rapids.sql .expression .BitwiseNot |
|
Returns the bitwise NOT of the operands |
true |
None |
spar k.rapids.sq l.expressio n.BitwiseOr |
|
Returns the bitwise OR of the operands |
true |
None |
spark .rapids.sql .expression .BitwiseXor |
|
Returns the bitwise XOR of the operands |
true |
None |
spa rk.rapids.s ql.expressi on.CaseWhen |
|
CASE WHEN expression |
true |
None |
spark.rapi ds.sql.expr ession.Cast |
|
Convert a column of one type of data into another type |
true |
None |
spark.rapi ds.sql.expr ession.Cbrt |
|
Cube root |
true |
None |
spark.rapi ds.sql.expr ession.Ceil |
`
ceiling`,
|
Ceiling of a number |
true |
None |
spark.ra pids.sql.ex pression.Ch eckOverflow |
Ch eckOverflow after arithmetic operations between DecimalType data |
true |
None |
|
spa rk.rapids.s ql.expressi on.Coalesce |
` coalesce` |
Returns the first non-null argument if exists. Otherwise, null |
true |
None |
s park.rapids .sql.expres sion.Concat |
|
List/String concatenate |
true |
None |
spa rk.rapids.s ql.expressi on.ConcatWs |
`` concat_ws`` |
C oncatenates multiple input strings or array of strings into a single string using a given separator |
true |
None |
spa rk.rapids.s ql.expressi on.Contains |
Contains |
true |
None |
|
spark.rap ids.sql.exp ression.Cos |
|
Cosine |
true |
None |
spark.rapi ds.sql.expr ession.Cosh |
|
Hyperbolic cosine |
true |
None |
spark.rap ids.sql.exp ression.Cot |
|
Cotangent |
true |
None |
spark. rapids.sql. expression. CreateArray |
|
Returns an array with the given elements |
true |
None |
spar k.rapids.sq l.expressio n.CreateMap |
|
Create a map |
true |
None |
s park.rapids .sql.expres sion.Create NamedStruct |
|
Creates a struct with the given field names and values |
true |
None |
spark. rapids.sql. expression. CurrentRow$ |
Special boundary for a window frame, indicating stopping at the current row |
true |
None |
|
sp ark.rapids. sql.express ion.DateAdd |
` date_add` |
Returns the date that is num_days after start_date |
true |
None |
spark.rapi ds.sql.expr ession.Date AddInterval |
Adds interval to date |
true |
None |
|
spa rk.rapids.s ql.expressi on.DateDiff |
` datediff` |
Returns the number of days from startDate to endDate |
true |
None |
spark.rapi ds.sql.expr ession.Date FormatClass |
|
Converts timestamp to a value of string in the format specified by the date format |
true |
None |
sp ark.rapids. sql.express ion.DateSub |
` date_sub` |
Returns the date that is num_days before start_date |
true |
None |
spark .rapids.sql .expression .DayOfMonth |
|
Returns the day of the month from a date or timestamp |
true |
None |
spar k.rapids.sq l.expressio n.DayOfWeek |
`` dayofweek`` |
Returns the day of the week (1 = Sunday… 7=Saturday) |
true |
None |
spar k.rapids.sq l.expressio n.DayOfYear |
`` dayofyear`` |
Returns the day of the year from a date or timestamp |
true |
None |
spar k.rapids.sq l.expressio n.DenseRank |
|
Window function that returns the dense rank value within the aggregation window |
true |
None |
s park.rapids .sql.expres sion.Divide |
|
Division |
true |
None |
spar k.rapids.sq l.expressio n.ElementAt |
|
Returns element of array at giv en(1-based) index in value if column is array. Returns value for the given key in value if column is map. |
true |
None |
spa rk.rapids.s ql.expressi on.EndsWith |
Ends with |
true |
None |
|
spark.ra pids.sql.ex pression.Eq ualNullSafe |
|
Check if the values are equal including nulls <=> |
true |
None |
sp ark.rapids. sql.express ion.EqualTo |
|
Check if the values are equal |
true |
None |
spark.rap ids.sql.exp ression.Exp |
|
Euler’s number e raised to a power |
true |
None |
sp ark.rapids. sql.express ion.Explode |
`
explode`,
|
Given an input array produces a sequence of rows for each value in the array |
true |
None |
spark.rapid s.sql.expre ssion.Expm1 |
|
Euler’s number e raised to a power minus 1 |
true |
None |
spark.rapid s.sql.expre ssion.Floor |
|
Floor of a number |
true |
None |
spark.rapid s.sql.expre ssion.FromU TCTimestamp |
|
Render the input UTC timestamp in the input timezone |
true |
None |
spark.r apids.sql.e xpression.F romUnixTime |
|
Get the string from a unix timestamp |
true |
None |
spark.r apids.sql.e xpression.G etArrayItem |
Gets the
field at
|
true |
None |
|
spar k.rapids.sq l.expressio n.GetArrayS tructFields |
Extracts
the
|
true |
None |
|
spark.ra pids.sql.ex pression.Ge tJsonObject |
|
Extracts a json object from path |
true |
None |
spark. rapids.sql. expression. GetMapValue |
Gets Value from a Map based on a key |
true |
None |
|
spark.rap ids.sql.exp ression.Get StructField |
Gets the named field of the struct |
true |
None |
|
spark.r apids.sql.e xpression.G etTimestamp |
Gets timestamps from strings using given pattern. |
true |
None |
|
spark. rapids.sql. expression. GreaterThan |
|
> operator |
true |
None |
sp ark.rapids. sql.express ion.Greater ThanOrEqual |
|
>= operator |
true |
None |
spa rk.rapids.s ql.expressi on.Greatest |
` greatest` |
Returns the greatest value of all parameters, skipping null values |
true |
None |
spark.rapi ds.sql.expr ession.Hour |
|
Returns the hour component of the strin g/timestamp |
true |
None |
spark.rapid s.sql.expre ssion.Hypot |
|
Pythagorean addition ( Hypotenuse) of real numbers |
true |
None |
spark.ra pids.sql.ex pression.If |
|
IF expression |
true |
None |
spark.ra pids.sql.ex pression.In |
|
IN operator |
true |
None |
spark.rapid s.sql.expre ssion.InSet |
INSET operator |
true |
None |
|
sp ark.rapids. sql.express ion.InitCap |
|
Returns str with the first letter of each word in uppercase. All other letters are in lowercase |
true |
This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. |
spar k.rapids.sq l.expressio n.InputFile BlockLength |
|
Returns the length of the block being read, or -1 if not available |
true |
None |
spa rk.rapids.s ql.expressi on.InputFil eBlockStart |
|
Returns the start offset of the block being read, or -1 if not available |
true |
None |
spark.ra pids.sql.ex pression.In putFileName |
|
Returns the name of the file being read, or empty string if not available |
true |
None |
spark.rap ids.sql.exp ression.Int egralDivide |
|
Division with a integer result |
true |
None |
spark.rapid s.sql.expre ssion.IsNaN |
|
Checks if a value is NaN |
true |
None |
spar k.rapids.sq l.expressio n.IsNotNull |
`` isnotnull`` |
Checks if a value is not null |
true |
None |
s park.rapids .sql.expres sion.IsNull |
|
Checks if a value is null |
true |
None |
spark.ra pids.sql.ex pression.Js onToStructs |
`` from_json`` |
Returns a
struct
value with
the given
|
false |
This is disabled by default because parsing JSON from a column has a large number of issues and should be considered beta quality right now. |
spar k.rapids.sq l.expressio n.JsonTuple |
|
Returns a tuple like the function get_j son_object, but it takes multiple names. All the input parameters and output column types are string. |
true |
None |
s park.rapids .sql.expres sion.KnownF loatingPoin tNormalized |
Tag to prevent redundant no rmalization |
true |
None |
|
spark.r apids.sql.e xpression.K nownNotNull |
Tag an expression as known to not be null |
true |
None |
|
spark.rap ids.sql.exp ression.Lag |
|
Window function that returns N entries behind this one |
true |
None |
spark.rap ids.sql.exp ression.Lam bdaFunction |
Holds a higher order SQL function |
true |
None |
|
sp ark.rapids. sql.express ion.LastDay |
` last_day` |
Returns the last day of the month which the date belongs to |
true |
None |
spark.rapi ds.sql.expr ession.Lead |
|
Window function that returns N entries ahead of this one |
true |
None |
spark.rapid s.sql.expre ssion.Least |
|
Returns the least value of all parameters, skipping null values |
true |
None |
s park.rapids .sql.expres sion.Length |
|
String character length or binary byte length |
true |
None |
spa rk.rapids.s ql.expressi on.LessThan |
|
< operator |
true |
None |
spark.rapi ds.sql.expr ession.Less ThanOrEqual |
|
<= operator |
true |
None |
spark.rapi ds.sql.expr ession.Like |
|
Like |
true |
None |
sp ark.rapids. sql.express ion.Literal |
Holds a static value from the query |
true |
None |
|
spark.rap ids.sql.exp ression.Log |
|
Natural log |
true |
None |
spark.rapid s.sql.expre ssion.Log10 |
|
Log base 10 |
true |
None |
spark.rapid s.sql.expre ssion.Log1p |
|
Natural log 1 + expr |
true |
None |
spark.rapi ds.sql.expr ession.Log2 |
|
Log base 2 |
true |
None |
spar k.rapids.sq l.expressio n.Logarithm |
|
Log variable base |
true |
None |
spark.rapid s.sql.expre ssion.Lower |
|
String lowercase operator |
true |
This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. |
spark. rapids.sql. expression. MakeDecimal |
Create a Decimal from an unscaled long value for some aggregation op timizations |
true |
None |
|
spar k.rapids.sq l.expressio n.MapConcat |
|
Returns the union of all the given maps |
true |
None |
spark .rapids.sql .expression .MapEntries |
|
Returns an unordered array of all entries in the given map |
true |
None |
spar k.rapids.sq l.expressio n.MapFilter |
|
Filters entries in a map using the function |
true |
None |
sp ark.rapids. sql.express ion.MapKeys |
` map_keys` |
Returns an unordered array containing the keys of the map |
true |
None |
spar k.rapids.sq l.expressio n.MapValues |
|
Returns an unordered array containing the values of the map |
true |
None |
spark.rap ids.sql.exp ression.Md5 |
|
MD5 hash operator |
true |
None |
s park.rapids .sql.expres sion.Minute |
|
Returns the minute component of the strin g/timestamp |
true |
None |
spark.rap ids.sql.exp ression.Mon otonicallyI ncreasingID |
|
Returns mo notonically increasing 64-bit integers |
true |
None |
spark.rapid s.sql.expre ssion.Month |
|
Returns the month from a date or timestamp |
true |
None |
spa rk.rapids.s ql.expressi on.Multiply |
|
Mul tiplication |
true |
None |
spark. rapids.sql. expression. Murmur3Hash |
|
Murmur3 hash operator |
true |
None |
spark.rapid s.sql.expre ssion.NaNvl |
|
Evaluates
to |
true |
None |
spa rk.rapids.s ql.expressi on.NamedLam bdaVariable |
A parameter to a higher order SQL function |
true |
None |
|
spark.rap ids.sql.exp ression.Not |
|
Boolean not operator |
true |
None |
spa rk.rapids.s ql.expressi on.NthValue |
`` nth_value`` |
nth window operator |
true |
None |
spark. rapids.sql. expression. OctetLength |
|
The byte length of string data |
true |
None |
spark.ra pids.sql.ex pression.Or |
|
Logical OR |
true |
None |
spark. rapids.sql. expression. PercentRank |
|
Window function that returns the percent rank value within the aggregation window |
true |
None |
spark.rapi ds.sql.expr ession.Pmod |
|
Pmod |
true |
None |
spark .rapids.sql .expression .PosExplode |
|
Given an input array produces a sequence of rows for each value in the array |
true |
None |
spark.rap ids.sql.exp ression.Pow |
|
lhs ^ rhs |
true |
None |
spark.rapi ds.sql.expr ession.Prec iseTimestam pConversion |
Expression used internally to convert the Ti mestampType to Long and back without losing precision, i.e. in mi croseconds. Used in time windowing |
true |
None |
|
spark.rapid s.sql.expre ssion.Promo tePrecision |
Promo tePrecision before arithmetic operations between DecimalType data |
true |
None |
|
spar k.rapids.sq l.expressio n.PythonUDF |
UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated |
true |
None |
|
sp ark.rapids. sql.express ion.Quarter |
|
Returns the quarter of the year for date, in the range 1 to 4 |
true |
None |
spark.rapid s.sql.expre ssion.RLike |
|
Regular expression version of Like |
true |
None |
spark .rapids.sql .expression .RaiseError |
|
Throw an exception |
true |
None |
spark.rapi ds.sql.expr ession.Rand |
|
Generate a random column with i.i.d. uniformly distributed values in [0, 1) |
true |
None |
spark.rapi ds.sql.expr ession.Rank |
|
Window function that returns the rank value within the aggregation window |
true |
None |
spark.ra pids.sql.ex pression.Re gExpExtract |
|
Extract a specific group identified by a regular expression |
true |
None |
spark.rapid s.sql.expre ssion.RegEx pExtractAll |
|
Extract all strings matching a regular expression co rresponding to the regex group index |
true |
None |
spark.ra pids.sql.ex pression.Re gExpReplace |
|
String replace using a regular expression pattern |
true |
None |
spar k.rapids.sq l.expressio n.Remainder |
|
Remainder or modulo |
true |
None |
spark.ra pids.sql.ex pression.Re plicateRows |
Given an input row replicates the row N times |
true |
None |
|
sp ark.rapids. sql.express ion.Reverse |
|
Returns a reversed string or an array with reverse order of elements |
true |
None |
spark.rapi ds.sql.expr ession.Rint |
|
Rounds up a double value to the nearest double equal to an integer |
true |
None |
spark.rapid s.sql.expre ssion.Round |
|
Round an expression to d decimal places using HALF_UP rounding mode |
true |
None |
spar k.rapids.sq l.expressio n.RowNumber |
|
Window function that returns the index for the row within the aggregation window |
true |
None |
spa rk.rapids.s ql.expressi on.ScalaUDF |
User Defined Function, the UDF can choose to implement a RAPIDS accelerated interface to get better p erformance. |
true |
None |
|
s park.rapids .sql.expres sion.Second |
|
Returns the second component of the strin g/timestamp |
true |
None |
spa rk.rapids.s ql.expressi on.Sequence |
` sequence` |
Sequence |
true |
None |
spar k.rapids.sq l.expressio n.ShiftLeft |
`` shiftleft`` |
Bitwise shift left («) |
true |
None |
spark .rapids.sql .expression .ShiftRight |
|
Bitwise shift right (») |
true |
None |
sp ark.rapids. sql.express ion.ShiftRi ghtUnsigned |
|
Bitwise unsigned shift right (»>) |
true |
None |
s park.rapids .sql.expres sion.Signum |
|
Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive |
true |
None |
spark.rap ids.sql.exp ression.Sin |
|
Sine |
true |
None |
spark.rapi ds.sql.expr ession.Sinh |
|
Hyperbolic sine |
true |
None |
spark.rapi ds.sql.expr ession.Size |
|
The size of an array or a map |
true |
None |
spar k.rapids.sq l.expressio n.SortArray |
|
Returns a sorted array with the input array and the ascending / descending order |
true |
None |
spar k.rapids.sq l.expressio n.SortOrder |
Sort order |
true |
None |
|
spark.rapid s.sql.expre ssion.Spark PartitionID |
|
Returns the current partition id |
true |
None |
spar k.rapids.sq l.expressio n.Specified WindowFrame |
Sp ecification of the width of the group (or “frame”) of input rows around which a window function is evaluated |
true |
None |
|
spark.rapi ds.sql.expr ession.Sqrt |
|
Square root |
true |
None |
spark .rapids.sql .expression .StartsWith |
Starts with |
true |
None |
|
spark. rapids.sql. expression. StringInstr |
|
Instr string operator |
true |
None |
spark .rapids.sql .expression .StringLPad |
|
Pad a string on the left |
true |
None |
spark.r apids.sql.e xpression.S tringLocate |
``
position``,
|
Substring search operator |
true |
None |
spark .rapids.sql .expression .StringRPad |
|
Pad a string on the right |
true |
None |
spark.r apids.sql.e xpression.S tringRepeat |
|
S tringRepeat operator that repeats the given strings with numbers of times given by repeatTimes |
true |
None |
spark.ra pids.sql.ex pression.St ringReplace |
|
St ringReplace operator |
true |
None |
spark. rapids.sql. expression. StringSplit |
|
Splits
|
true |
None |
spark. rapids.sql. expression. StringToMap |
|
Creates a map after splitting the input string into pairs of key-value strings |
true |
None |
spark .rapids.sql .expression .StringTrim |
|
StringTrim operator |
true |
None |
spark.rap ids.sql.exp ression.Str ingTrimLeft |
|
Str ingTrimLeft operator |
true |
None |
spark.rapi ds.sql.expr ession.Stri ngTrimRight |
|
Stri ngTrimRight operator |
true |
None |
spar k.rapids.sq l.expressio n.Substring |
|
Substring operator |
true |
None |
spark.rap ids.sql.exp ression.Sub stringIndex |
|
subs tring_index operator |
true |
None |
spa rk.rapids.s ql.expressi on.Subtract |
|
Subtraction |
true |
None |
spark.rap ids.sql.exp ression.Tan |
|
Tangent |
true |
None |
spark.rapi ds.sql.expr ession.Tanh |
|
Hyperbolic tangent |
true |
None |
sp ark.rapids. sql.express ion.TimeAdd |
Adds interval to timestamp |
true |
None |
|
spar k.rapids.sq l.expressio n.ToDegrees |
|
Converts radians to degrees |
true |
None |
spar k.rapids.sq l.expressio n.ToRadians |
|
Converts degrees to radians |
true |
None |
spark.rapi ds.sql.expr ession.ToUn ixTimestamp |
|
Returns the UNIX timestamp of the given time |
true |
None |
spark.ra pids.sql.ex pression.Tr ansformKeys |
|
Transform keys in a map using a transform function |
true |
None |
spark.rapi ds.sql.expr ession.Tran sformValues |
|
Transform values in a map using a transform function |
true |
None |
spark .rapids.sql .expression .UnaryMinus |
` negative` |
Negate a numeric value |
true |
None |
spark.ra pids.sql.ex pression.Un aryPositive |
` positive` |
A numeric value with a + in front of it |
true |
None |
spa rk.rapids.s ql.expressi on.Unbounde dFollowing$ |
Special boundary for a window frame, indicating all rows preceding the current row |
true |
None |
|
spa rk.rapids.s ql.expressi on.Unbounde dPreceding$ |
Special boundary for a window frame, indicating all rows preceding the current row |
true |
None |
|
spark.ra pids.sql.ex pression.Un ixTimestamp |
|
Returns the UNIX timestamp of current or specified time |
true |
None |
spark.ra pids.sql.ex pression.Un scaledValue |
Convert a Decimal to an unscaled long value for some aggregation op timizations |
true |
None |
|
spark.rapid s.sql.expre ssion.Upper |
|
String uppercase operator |
true |
This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. |
sp ark.rapids. sql.express ion.WeekDay |
|
Returns the day of the week (0 = Monda y…6=Sunday) |
true |
None |
spark.rapid s.sql.expre ssion.Windo wExpression |
Calculates a return value for every input row of a table based on a group (or “window”) of rows |
true |
None |
|
spar k.rapids.sq l.expressio n.WindowSpe cDefinition |
Sp ecification of a window function, indicating the pa rtitioning- expression, the row ordering, and the width of the window |
true |
None |
|
spark.rapi ds.sql.expr ession.Year |
|
Returns the year from a date or timestamp |
true |
None |
spa rk.rapids.s ql.expressi on.Aggregat eExpression |
Aggregate expression |
true |
None |
|
spark .rapids.sql .expression .Approximat ePercentile |
|
Approximate percentile |
true |
This is not 100% compatible with the Spark version because the GPU imp lementation of approx _percentile is not bit-for-bit compatible with Apache Spark |
sp ark.rapids. sql.express ion.Average |
|
Average aggregate operator |
true |
None |
spark. rapids.sql. expression. CollectList |
|
Collect a list of non-unique elements, not supported in reduction |
true |
None |
spark .rapids.sql .expression .CollectSet |
|
Collect a set of unique elements, not supported in reduction |
true |
None |
spark.rapid s.sql.expre ssion.Count |
|
Count aggregate operator |
true |
None |
spark.rapid s.sql.expre ssion.First |
|
first aggregate operator |
true |
None |
spark.rapi ds.sql.expr ession.Last |
|
last aggregate operator |
true |
None |
spark.rap ids.sql.exp ression.Max |
|
Max aggregate operator |
true |
None |
spark.rap ids.sql.exp ression.Min |
|
Min aggregate operator |
true |
None |
spark .rapids.sql .expression .PivotFirst |
PivotFirst operator |
true |
None |
|
spar k.rapids.sq l.expressio n.StddevPop |
|
Aggregation computing population standard deviation |
true |
None |
spark .rapids.sql .expression .StddevSamp |
|
Aggregation computing sample standard deviation |
true |
None |
spark.rap ids.sql.exp ression.Sum |
|
Sum aggregate operator |
true |
None |
spark. rapids.sql. expression. VariancePop |
|
Aggregation computing population variance |
true |
None |
spark.r apids.sql.e xpression.V arianceSamp |
`` var_samp``, ` variance` |
Aggregation computing sample variance |
true |
None |
spa rk.rapids.s ql.expressi on.Normaliz eNaNAndZero |
Normalize NaN and zero |
true |
None |
|
spark.rap ids.sql.exp ression.Sca larSubquery |
Subquery that will return only one row and one column |
true |
None |
|
spark.rap ids.sql.exp ression.Hiv eGenericUDF |
Hive Generic UDF, the UDF can choose to implement a RAPIDS accelerated interface to get better performance |
true |
None |
|
spark.ra pids.sql.ex pression.Hi veSimpleUDF |
Hive UDF, the UDF can choose to implement a RAPIDS accelerated interface to get better performance |
true |
None |
Execution#
Name |
Description |
Default Value |
Notes |
---|---|---|---|
spark. rapids.sql.exe c.CoalesceExec |
The backend for the dataframe coalesce method |
true |
None |
spark.rapi ds.sql.exec.Co llectLimitExec |
Reduce to single partition and apply limit |
false |
This is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU |
spar k.rapids.sql.e xec.ExpandExec |
The backend for the expand operator |
true |
None |
spark.rapids .sql.exec.File SourceScanExec |
Reading data from files, often from Hive tables |
true |
None |
spar k.rapids.sql.e xec.FilterExec |
The backend for most filter statements |
true |
None |
spark. rapids.sql.exe c.GenerateExec |
The backend for operations that generate more output rows than input rows like explode |
true |
None |
spark.rap ids.sql.exec.G lobalLimitExec |
Limiting of results across partitions |
true |
None |
spark.ra pids.sql.exec. LocalLimitExec |
Per-partition limiting of results |
true |
None |
spark .rapids.sql.ex ec.ProjectExec |
The backend for most select, withColumn and dropColumn statements |
true |
None |
spa rk.rapids.sql. exec.RangeExec |
The backend for range operator |
true |
None |
spar k.rapids.sql.e xec.SampleExec |
The backend for the sample operator |
true |
None |
sp ark.rapids.sql .exec.SortExec |
The backend for the sort operator |
true |
None |
s park.rapids.sq l.exec.Subquer yBroadcastExec |
Plan to collect and transform the broadcast key values |
true |
None |
spark .rapids.sql.ex ec.TakeOrdered AndProjectExec |
Take the first limit elements as defined by the sortOrder, and do projection if needed |
true |
None |
spa rk.rapids.sql. exec.UnionExec |
The backend for the union operator |
true |
None |
spa rk.rapids.sql. exec.CustomShu ffleReaderExec |
A wrapper of shuffle query stage |
true |
None |
spark.rapid s.sql.exec.Has hAggregateExec |
The backend for hash based aggregations |
true |
None |
spa rk.rapids.sql. exec.ObjectHas hAggregateExec |
The backend for hash based aggregations supporting TypedImper ativeAggregate functions |
true |
None |
spark.rapid s.sql.exec.Sor tAggregateExec |
The backend for sort based aggregations |
true |
None |
s park.rapids.sq l.exec.InMemor yTableScanExec |
Implementation of InMemor yTableScanExec to use GPU accelerated caching |
true |
None |
sp ark.rapids.sql .exec.DataWrit ingCommandExec |
Writing data |
true |
None |
spark.rapids. sql.exec.Execu tedCommandExec |
Eagerly executed commands |
true |
None |
spark.r apids.sql.exec .BatchScanExec |
The backend for most file input |
true |
None |
s park.rapids.sq l.exec.Broadca stExchangeExec |
The backend for broadcast exchange of data |
true |
None |
spark.rapids. sql.exec.Shuff leExchangeExec |
The backend for most data being exchanged between processes |
true |
None |
s park.rapids.sq l.exec.Broadca stHashJoinExec |
Implementation of join using broadcast data |
true |
None |
spark.r apids.sql.exec .BroadcastNest edLoopJoinExec |
Implementation of join using brute force. Full outer joins and joins where the broadcast side matches the join side (e.g.: LeftOuter with left broadcast) are not supported |
true |
None |
spark.rapids.s ql.exec.Cartes ianProductExec |
Implementation of join using brute force |
true |
None |
spark.rapids.s ql.exec.Shuffl edHashJoinExec |
Implementation of join using hashed shuffled data |
true |
None |
spark.rapid s.sql.exec.Sor tMergeJoinExec |
Sort merge join, replacing with shuffled hash join |
true |
None |
s park.rapids.sq l.exec.Aggrega teInPandasExec |
The backend for an Aggregation Pandas UDF, this accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. |
true |
None |
spark.rapids. sql.exec.Arrow EvalPythonExec |
The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled |
true |
None |
spark.r apids.sql.exec .FlatMapCoGrou psInPandasExec |
The backend for CoGrouped Aggregation Pandas UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. |
false |
This is disabled by default because Performance is not ideal with many small groups |
spark .rapids.sql.ex ec.FlatMapGrou psInPandasExec |
The backend for Flat Map Groups Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. |
true |
None |
spark.rap ids.sql.exec.M apInPandasExec |
The backend for Map Pandas Iterator UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. |
true |
None |
spark.rapids .sql.exec.Wind owInPandasExec |
The backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame. |
false |
This is disabled by default because it only supports row based frame for now |
spar k.rapids.sql.e xec.WindowExec |
W indow-operator backend |
true |
None |
spark.rapid s.sql.exec.Hiv eTableScanExec |
Scan Exec to read Hive delimited text tables |
true |
None |
Commands#
Name |
Description |
Default Value |
Notes |
---|---|---|---|
spark.rapids .sql.command.SaveIn toDataSourceCommand |
Write to a data source |
true |
None |
Scans#
Name |
Description |
Default Value |
Notes |
---|---|---|---|
spark.rapids.sql.input.CSVScan |
CSV parsing |
true |
None |
spark.rapids.sql.input.JsonScan |
Json parsing |
true |
None |
spark.rapids.sql.input.OrcScan |
ORC parsing |
true |
None |
spark.rapids.sql.input.ParquetScan |
Parquet parsing |
true |
None |
spark.rapids.sql.input.AvroScan |
Avro parsing |
true |
None |
Partitioning#
Name |
Description |
Default Value |
Notes |
---|---|---|---|
spark.ra pids.sql.partitioni ng.HashPartitioning |
Hash based partitioning |
true |
None |
spark.rap ids.sql.partitionin g.RangePartitioning |
Range partitioning |
true |
None |
spark.rapids.s ql.partitioning.Rou ndRobinPartitioning |
Round robin partitioning |
true |
None |
spark.ra pids.sql.partitioni ng.SinglePartition$ |
Single partitioning |
true |
None |