Supported Operators#

Apache Spark supports processing various types of data. Not all expressions support all data types. The RAPIDS Accelerator for Apache Spark has further restrictions on what types are supported for processing. This tries to document what operations are supported and what data types each operation supports. Because Apache Spark is under active development too and this document was generated against version 3.1.1 of Spark. Most of this should still apply to other versions of Spark, but there may be slight changes.

General Limitations#

Decimal

The Decimal type in Spark supports a precision up to 38 digits (128-bits).T he RAPIDS Accelerator supports 128-bit starting from version 21.12 and decimals are enabled by default. Please check Decimal Support for more details.

Decimal precision and scale follow the same rule as CPU mode in Apache Spark:

* In particular, if we have expressions e1 and e2 with precision/scale p1/s1 and p2/s2
* respectively, then the following operations have the following precision / scale:
*
*   Operation    Result Precision                        Result Scale
*   ------------------------------------------------------------------------
*   e1 + e2      max(s1, s2) + max(p1-s1, p2-s2) + 1     max(s1, s2)
*   e1 - e2      max(s1, s2) + max(p1-s1, p2-s2) + 1     max(s1, s2)
*   e1 * e2      p1 + p2 + 1                             s1 + s2
*   e1 / e2      p1 - s1 + s2 + max(6, s1 + p2 + 1)      max(6, s1 + p2 + 1)
*   e1 % e2      min(p1-s1, p2-s2) + max(s1, s2)         max(s1, s2)
*   e1 union e2  max(s1, s2) + max(p1-s1, p2-s2)         max(s1, s2)

However, Spark inserts PromotePrecision to CAST both sides to the same type. GPU mode may fall back to CPU even if the result Decimal precision is within 18 digits. For example, Decimal(8,2) x Decimal(6,3) resulting in Decimal (15,5) runs on CPU, because due to PromotePrecision, GPU mode assumes the result is Decimal(19,6). There are even extreme cases where Spark can temporarily return a Decimal value larger than what can be stored in 128-bits and then uses the CheckOverflow operator to round it to a desired precision and scale. This means that even when the accelerator supports 128-bit decimal, we might not be able to support all operations that Spark can support.

Timestamp

Timestamps in Spark will all be converted to the local time zone before processing and are often converted to UTC before being stored, like in Parquet or ORC. The RAPIDS Accelerator only supports UTC as the time zone for timestamps.

CalenderInterval

In Spark CalendarIntervals store three values, months, days, and microseconds. Support for this type is still very limited in the accelerator. In some cases only a a subset of the type is supported, like window ranges only support days currently.

Configuration#

There are lots of different configuration values that can impact if an operation is supported or not. Some of these are a part of the RAPIDS Accelerator and cover the level of compatibility with Apache Spark. Those are covered here. Others are a part of Apache Spark itself and those are a bit harder to document. The work of updating this to cover that support is still ongoing.

In general though if you ever have any question about why an operation is not running on the GPU you may set spark.rapids.sql.explain to ALL and it will try to give all of the reasons why this particular operator or expression is on the CPU or GPU.

Key#

Types#

Type Name

Type Description

BOOLEAN

Holds true or false values.

BYTE

Signed 8-bit integer value.

SHORT

Signed 16-bit integer value.

INT

Signed 32-bit integer value.

LONG

Signed 64-bit integer value.

FLOAT

32-bit floating point value.

DOUBLE

64-bit floating point value.

DATE

A date with no time component. Stored as 32-bit integer with days since Jan 1, 1970.

TIMESTAMP

A date and time. Stored as 64-bit integer with microseconds since Jan 1, 1970 in the current time zone.

STRING

A text string. Stored as UTF-8 encoded bytes.

DECIMAL

A fixed point decimal value with configurable precision and scale.

NULL

Only stores null values and is typically only used when no other type can be determined from the SQL.

BINARY

An array of non-nullable bytes.

CALENDAR

Represents a period of time. Stored as months, days and microseconds.

ARRAY

A sequence of elements.

MAP

A set of key value pairs, the keys cannot be null.

STRUCT

A series of named fields.

UDT

User defined types and java Objects. These are not standard SQL types.

Support#

Value

Description

S

(Supported) Both Apache Spark and the RAPIDS Accelerator support this type fully.

(Not Applicable) Neither Spark not the RAPIDS Accelerator support this type in this situation.

PS

(Partial Support) Apache Spark supports this type, but the RAPIDS Accelerator only partially supports it. An explanation for what is missing will be included with this.

NS

(Not Supported) Apache Spark supports this type but the RAPIDS Accelerator does not.

SparkPlan or Executor Nodes#

Apache Spark uses a Directed Acyclic Graph(DAG) of processing to build a query. The nodes in this graph are instances of SparkPlan and represent various high level operations like doing a filter or project. The operations that the RAPIDS Accelerator supports are described below.

Executor

Description

Notes

Param(s)

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

CoalesceExec

The backend for the dataframe coalesce method

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

CollectLimitExec

Reduce to single partition and apply limit

This is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

ExpandExec

The backend for the expand operator

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

FileSourceScanExec

Reading data from files, often from Hive tables

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

FilterExec

The backend for most filter statements

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

GenerateExec

The backend for operations that generate more output rows than input rows like explode

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

GlobalLimitExec

Limiting of results across partitions

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

LocalLimitExec

Per-partition limiting of results

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

ProjectExec

The backend for most select, withColumn and dropColumn statements

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

RangeExec

The backend for range operator

None

Input/Output

S

SampleExec

The backend for the sample operator

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

SortExec

The backend for the sort operator

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

SubqueryBroadcastExec

Plan to collect and transform the broadcast key values

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

S

TakeOrderedAndProjectExec

Take the first limit elements as defined by the sortOrder, and do projection if needed

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

UnionExec

The backend for the union operator

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS unionByName will not optionally impute nulls for missing struct fields when the column is a struct and there are non-overlapping fields; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

Executor

Description

Notes

Param(s)

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

CustomShuffleReaderExec

A wrapper of shuffle query stage

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

HashAggregateExec

The backend for hash based aggregations

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

ObjectHashAggregateExec

The backend for hash based aggregations supporting TypedImperativeAggregate functions

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

PS only allowed when aggregate buffers can be converted between CPU and GPU

NS

PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

SortAggregateExec

The backend for sort based aggregations

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

PS only allowed when aggregate buffers can be converted between CPU and GPU

NS

PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

InMemoryTableScanExec

Implementation of InMemoryTableScanExec to use GPU accelerated caching

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT

NS

DataWritingCommandExec

Writing data

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

PS 128bit decimal only supported for Orc and Parquet

NS

PS Only supported for Parquet

NS

PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT

PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT

PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT

NS

ExecutedCommandExec

Eagerly executed commands

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

S

BatchScanExec

The backend for most file input

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT

NS

BroadcastExchangeExec

The backend for broadcast exchange of data

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT

NS

ShuffleExchangeExec

The backend for most data being exchanged between processes

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS Round-robin partitioning is not supported for nested structs if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

BroadcastHashJoinExec

Implementation of join using broadcast data

None

leftKeys

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

rightKeys

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

condition

S

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

BroadcastNestedLoopJoinExec

Implementation of join using brute force. Full outer joins and joins where the broadcast side matches the join side (e.g.: LeftOuter with left broadcast) are not supported

None

condition (A non-inner join only is supported if the condition expression can be converted to a GPU AST expression)

S

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

Executor

Description

Notes

Param(s)

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

CartesianProductExec

Implementation of join using brute force

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

ShuffledHashJoinExec

Implementation of join using hashed shuffled data

None

leftKeys

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

rightKeys

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

condition

S

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

SortMergeJoinExec

Sort merge join, replacing with shuffled hash join

None

leftKeys

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

rightKeys

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

condition

S

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

AggregateInPandasExec

The backend for an Aggregation Pandas UDF, this accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled.

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

NS

NS

NS

NS

ArrowEvalPythonExec

The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

FlatMapCoGroupsInPandasExec

The backend for CoGrouped Aggregation Pandas UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled.

This is disabled by default because Performance is not ideal with many small groups

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

NS

NS

NS

NS

FlatMapGroupsInPandasExec

The backend for Flat Map Groups Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled.

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

NS

NS

NS

NS

MapInPandasExec

The backend for Map Pandas Iterator UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled.

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

WindowInPandasExec

The backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame.

This is disabled by default because it only supports row based frame for now

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

NS

NS

NS

Executor

Description

Notes

Param(s)

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

WindowExec

Window-operator backend

None

partitionSpec

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

NS

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

HiveTableScanExec

Scan Exec to read Hive delimited text tables

None

Input/Output

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

NS

NS

NS

NS

NS

NS

Expression and SQL Functions#

Inside each node in the DAG there can be one or more trees of expressions that describe various types of processing that happens in that part of the plan. These can be things like adding two numbers together or checking for null. These expressions can have multiple input parameters and one output value. These expressions also can happen in different contexts. Because of how the accelerator works different contexts have different levels of support.

The most common expression context is project. In this context values from a single input row go through the expression and the result will also be use to produce something in the same row. Be aware that even in the case of aggregation and window operations most of the processing is still done in the project context either before or after the other processing happens.

Aggregation operations like count or sum can take place in either the aggregation, reduction, or window context. aggregation is when the operation was done while grouping the data by one or more keys. reduction is when there is no group by and there is a single result for an entire column. window is for window operations.

The final expression context is AST or Abstract Syntax Tree. Before explaining AST we first need to explain in detail how project context operations work. Generally for a project context operation the plan Spark developed is read on the CPU and an appropriate set of GPU kernels are selected to do those operations. For example a >= b + 1. Would result in calling a GPU kernel to add 1 to b, followed by another kernel that is called to compare a to that result. The interpretation is happening on the CPU, and the GPU is used to do the processing. For AST the interpretation for some reason cannot happen on the CPU and instead must be done in the GPU kernel itself. An example of this is conditional joins. If you want to join on A.a >= B.b + 1 where A and B are separate tables or data frames, the + and >= operations cannot run as separate independent kernels because it is done on a combination of rows in both A and B. Instead part of the plan that Spark developed is turned into an abstract syntax tree and sent to the GPU where it can be interpreted. The number and types of operations supported in this are limited.

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Abs

abs

Absolute value

None

project

input

S

S

S

S

S

S

S

result

S

S

S

S

S

S

S

AST

input

NS

NS

S

S

S

S

NS

result

NS

NS

S

S

S

S

NS

Acos

acos

Inverse cosine

None

project

input

S

result

S

AST

input

S

result

S

Acosh

acosh

Inverse hyperbolic cosine

None

project

input

S

result

S

AST

input

S

result

S

Add

+

Addition

None

project

lhs

S

S

S

S

S

S

S

NS

rhs

S

S

S

S

S

S

S

NS

result

S

S

S

S

S

S

S

NS

AST

lhs

NS

NS

S

S

S

S

NS

NS

rhs

NS

NS

S

S

S

S

NS

NS

result

NS

NS

S

S

S

S

NS

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Alias

Gives a column a name

None

project

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

AST

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

NS

And

and

Logical AND

None

project

lhs

S

rhs

S

result

S

AST

lhs

S

rhs

S

result

S

ArrayContains

array_contains

Returns a boolean if the array contains the passed in key

None

project

array

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

key

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

NS

NS

NS

NS

result

S

ArrayExcept

array_except

Returns an array of the elements in array1 but not in array2, without duplicates

This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+

project

array1

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

array2

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

ArrayExists

exists

Return true if any element satisfies the predicate LambdaFunction

None

project

argument

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

function

S

result

S

ArrayIntersect

array_intersect

Returns an array of the elements in the intersection of array1 and array2, without duplicates

This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+

project

array1

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

array2

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

ArrayMax

array_max

Returns the maximum value in the array

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

ArrayMin

array_min

Returns the minimum value in the array

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

ArrayRemove

array_remove

Returns the array after removing all elements that equal to the input element (right) from the input array (left)

None

project

array

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

NS

NS

element

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

ArrayRepeat

array_repeat

Returns the array containing the given input value (left) count (right) times

None

project

left

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

right

S

S

S

S

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

ArrayTransform

transform

Transform elements in an array using the transform function. This is similar to a map in functional programming

None

project

argument

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

function

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

ArrayUnion

array_union

Returns an array of the elements in the union of array1 and array2, without duplicates.

This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+

project

array1

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

array2

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

ArraysOverlap

arrays_overlap

Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.

This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+

project

array1

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

array2

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

result

S

ArraysZip

arrays_zip

Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

None

project

children

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

Asin

asin

Inverse sine

None

project

input

S

result

S

AST

input

S

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Asinh

asinh

Inverse hyperbolic sine

None

project

input

S

result

S

AST

input

S

result

S

AtLeastNNonNulls

Checks if number of non null/Nan values is greater than a given value

None

project

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

result

S

Atan

atan

Inverse tangent

None

project

input

S

result

S

AST

input

S

result

S

Atanh

atanh

Inverse hyperbolic tangent

None

project

input

S

result

S

AST

input

S

result

S

AttributeReference

References an input column

None

project

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

AST

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

BRound

bround

Round an expression to d decimal places using HALF_EVEN rounding mode

None

project

value

S

S

S

S

PS result may round slightly differently

PS result may round slightly differently

S

scale

S

result

S

S

S

S

S

S

S

BitLength

bit_length

The bit length of string data

None

project

input

S

NS

result

S

BitwiseAnd

&

Returns the bitwise AND of the operands

None

project

lhs

S

S

S

S

rhs

S

S

S

S

result

S

S

S

S

AST

lhs

NS

NS

S

S

rhs

NS

NS

S

S

result

NS

NS

S

S

BitwiseNot

~

Returns the bitwise NOT of the operands

None

project

input

S

S

S

S

result

S

S

S

S

AST

input

NS

NS

S

S

result

NS

NS

S

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

BitwiseOr

|

Returns the bitwise OR of the operands

None

project

lhs

S

S

S

S

rhs

S

S

S

S

result

S

S

S

S

AST

lhs

NS

NS

S

S

rhs

NS

NS

S

S

result

NS

NS

S

S

BitwiseXor

^

Returns the bitwise XOR of the operands

None

project

lhs

S

S

S

S

rhs

S

S

S

S

result

S

S

S

S

AST

lhs

NS

NS

S

S

rhs

NS

NS

S

S

result

NS

NS

S

S

CaseWhen

when

CASE WHEN expression

None

project

predicate

S

value

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Cbrt

cbrt

Cube root

None

project

input

S

result

S

AST

input

S

result

S

Ceil

ceiling, ceil

Ceiling of a number

None

project

input

S

S

S

result

S

S

S

CheckOverflow

CheckOverflow after arithmetic operations between DecimalType data

None

project

input

S

result

S

Coalesce

coalesce

Returns the first non-null argument if exists. Otherwise, null

None

project

param

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

Concat

concat

List/String concatenate

None

project

input

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

result

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

ConcatWs

concat_ws

Concatenates multiple input strings or array of strings into a single string using a given separator

None

project

input

S

S

result

S

Contains

Contains

None

project

src

S

search

PS Literal value only

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Cos

cos

Cosine

None

project

input

S

result

S

AST

input

S

result

S

Cosh

cosh

Hyperbolic cosine

None

project

input

S

result

S

AST

input

S

result

S

Cot

cot

Cotangent

None

project

input

S

result

S

AST

input

S

result

S

CreateArray

array

Returns an array with the given elements

None

project

arg

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT

CreateMap

map

Create a map

None

project

key

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

value

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

CreateNamedStruct

named_struct, struct

Creates a struct with the given field names and values

None

project

name

S

value

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

CurrentRow$

Special boundary for a window frame, indicating stopping at the current row

None

project

result

S

DateAdd

date_add

Returns the date that is num_days after start_date

None

project

startDate

S

days

S

S

S

result

S

DateAddInterval

Adds interval to date

None

project

start

S

interval

PS month intervals are not supported; Literal value only

result

S

DateDiff

datediff

Returns the number of days from startDate to endDate

None

project

lhs

S

rhs

S

result

S

DateFormatClass

date_format

Converts timestamp to a value of string in the format specified by the date format

None

project

timestamp

PS UTC is only supported TZ for TIMESTAMP

strfmt

PS A limited number of formats are supported; Literal value only

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

DateSub

date_sub

Returns the date that is num_days before start_date

None

project

startDate

S

days

S

S

S

result

S

DayOfMonth

dayofmonth, day

Returns the day of the month from a date or timestamp

None

project

input

S

result

S

DayOfWeek

dayofweek

Returns the day of the week (1 = Sunday…7=Saturday)

None

project

input

S

result

S

DayOfYear

dayofyear

Returns the day of the year from a date or timestamp

None

project

input

S

result

S

DenseRank

dense_rank

Window function that returns the dense rank value within the aggregation window

None

window

ordering

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

NS

result

S

Divide

/

Division

None

project

lhs

S

S

rhs

S

S

result

S

S

ElementAt

element_at

Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is map.

None

project

array/map

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS If it’s map, only primitive key types are supported.; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

index/key

PS Unsupported as array index.

PS Unsupported as array index.

PS Unsupported as array index.

S

PS Unsupported as array index.

PS Unsupported as array index.

PS Unsupported as array index.

PS Unsupported as array index.

PS Unsupported as array index.; UTC is only supported TZ for TIMESTAMP

PS Unsupported as array index.

PS Unsupported as array index.

NS

NS

NS

NS

NS

NS

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

EndsWith

Ends with

None

project

src

S

search

PS Literal value only

result

S

EqualNullSafe

<=>

Check if the values are equal including nulls <=>

None

project

lhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

rhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

result

S

EqualTo

=, ==

Check if the values are equal

None

project

lhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

rhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

result

S

AST

lhs

S

S

S

S

S

NS

NS

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

rhs

S

S

S

S

S

NS

NS

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

result

S

Exp

exp

Euler’s number e raised to a power

None

project

input

S

result

S

AST

input

S

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Explode

explode, explode_outer

Given an input array produces a sequence of rows for each value in the array

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

Expm1

expm1

Euler’s number e raised to a power minus 1

None

project

input

S

result

S

AST

input

S

result

S

Floor

floor

Floor of a number

None

project

input

S

S

S

result

S

S

S

FromUTCTimestamp

from_utc_timestamp

Render the input UTC timestamp in the input timezone

None

project

timestamp

PS UTC is only supported TZ for TIMESTAMP

timezone

PS Only timezones equivalent to UTC are supported

result

PS UTC is only supported TZ for TIMESTAMP

FromUnixTime

from_unixtime

Get the string from a unix timestamp

None

project

sec

S

format

PS Only a limited number of formats are supported; Literal value only

result

S

GetArrayItem

Gets the field at ordinal in the Array

None

project

array

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

ordinal

S

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

GetArrayStructFields

Extracts the ordinal-th fields of all array elements for the data with the type of array of struct

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

GetJsonObject

get_json_object

Extracts a json object from path

None

project

json

S

path

PS Literal value only

result

S

GetMapValue

Gets Value from a Map based on a key

None

project

map

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

key

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

NS

NS

NS

NS

NS

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

GetStructField

Gets the named field of the struct

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

GetTimestamp

Gets timestamps from strings using given pattern.

None

project

timeExp

S

PS UTC is only supported TZ for TIMESTAMP

S

format

PS A limited number of formats are supported; Literal value only

result

PS UTC is only supported TZ for TIMESTAMP

GreaterThan

>

> operator

None

project

lhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

rhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

result

S

AST

lhs

S

S

S

S

S

NS

NS

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

rhs

S

S

S

S

S

NS

NS

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

GreaterThanOrEqual

>=

>= operator

None

project

lhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

rhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

result

S

AST

lhs

S

S

S

S

S

NS

NS

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

rhs

S

S

S

S

S

NS

NS

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

result

S

Greatest

greatest

Returns the greatest value of all parameters, skipping null values

None

project

param

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

Hour

hour

Returns the hour component of the string/timestamp

None

project

input

PS UTC is only supported TZ for TIMESTAMP

result

S

Hypot

hypot

Pythagorean addition (Hypotenuse) of real numbers

None

project

lhs

S

rhs

S

result

S

If

if

IF expression

None

project

predicate

S

trueValue

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

falseValue

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

In

in

IN operator

None

project

value

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

list

PS Literal value only

PS Literal value only

PS Literal value only

PS Literal value only

PS Literal value only

PS Literal value only

PS Literal value only

PS Literal value only

PS UTC is only supported TZ for TIMESTAMP; Literal value only

PS Literal value only

PS Literal value only

NS

NS

NS

NS

NS

NS

result

S

InSet

INSET operator

None

project

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

result

S

InitCap

initcap

Returns str with the first letter of each word in uppercase. All other letters are in lowercase

This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly.

project

input

S

result

S

InputFileBlockLength

input_file_block_length

Returns the length of the block being read, or -1 if not available

None

project

result

S

InputFileBlockStart

input_file_block_start

Returns the start offset of the block being read, or -1 if not available

None

project

result

S

InputFileName

input_file_name

Returns the name of the file being read, or empty string if not available

None

project

result

S

IntegralDivide

div

Division with a integer result

None

project

lhs

S

S

rhs

S

S

result

S

IsNaN

isnan

Checks if a value is NaN

None

project

input

S

S

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

IsNotNull

isnotnull

Checks if a value is not null

None

project

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

result

S

IsNull

isnull

Checks if a value is null

None

project

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

result

S

JsonToStructs

from_json

Returns a struct value with the given jsonStr and schema

This is disabled by default because parsing JSON from a column has a large number of issues and should be considered beta quality right now.

project

jsonStr

S

result

NS

PS unsupported child types BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

NS

JsonTuple

json_tuple

Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string.

None

project

json

S

field

PS Literal value only

result

S

KnownFloatingPointNormalized

Tag to prevent redundant normalization

None

project

input

S

S

result

S

S

KnownNotNull

Tag an expression as known to not be null

None

project

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT

NS

Lag

lag

Window function that returns N entries behind this one

None

window

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

offset

S

default

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

LambdaFunction

Holds a higher order SQL function

None

project

function

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

arguments

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

LastDay

last_day

Returns the last day of the month which the date belongs to

None

project

input

S

result

S

Lead

lead

Window function that returns N entries ahead of this one

None

window

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

offset

S

default

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

Least

least

Returns the least value of all parameters, skipping null values

None

project

param

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

Length

length, character_length, char_length

String character length or binary byte length

None

project

input

S

NS

result

S

LessThan

<

< operator

None

project

lhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

rhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

result

S

AST

lhs

S

S

S

S

S

NS

NS

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

rhs

S

S

S

S

S

NS

NS

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

LessThanOrEqual

<=

<= operator

None

project

lhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

rhs

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

result

S

AST

lhs

S

S

S

S

S

NS

NS

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

rhs

S

S

S

S

S

NS

NS

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

result

S

Like

like

Like

None

project

src

S

search

PS Literal value only

result

S

Literal

Holds a static value from the query

None

project

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

NS

AST

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

NS

NS

NS

NS

NS

NS

NS

NS

NS

Log

ln

Natural log

None

project

input

S

result

S

Log10

log10

Log base 10

None

project

input

S

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Log1p

log1p

Natural log 1 + expr

None

project

input

S

result

S

Log2

log2

Log base 2

None

project

input

S

result

S

Logarithm

log

Log variable base

None

project

value

S

base

S

result

S

Lower

lower, lcase

String lowercase operator

This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly.

project

input

S

result

S

MakeDecimal

Create a Decimal from an unscaled long value for some aggregation optimizations

None

project

input

S

result

PS max DECIMAL precision of 18

MapConcat

map_concat

Returns the union of all the given maps

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

MapEntries

map_entries

Returns an unordered array of all entries in the given map

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

MapFilter

map_filter

Filters entries in a map using the function

None

project

argument

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

function

S

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

MapKeys

map_keys

Returns an unordered array containing the keys of the map

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

MapValues

map_values

Returns an unordered array containing the values of the map

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

Md5

md5

MD5 hash operator

None

project

input

S

result

S

Minute

minute

Returns the minute component of the string/timestamp

None

project

input

PS UTC is only supported TZ for TIMESTAMP

result

S

MonotonicallyIncreasingID

monotonically_increasing_id

Returns monotonically increasing 64-bit integers

None

project

result

S

Month

month

Returns the month from a date or timestamp

None

project

input

S

result

S

Multiply

*

Multiplication

None

project

lhs

S

S

S

S

S

S

S

rhs

S

S

S

S

S

S

S

result

S

S

S

S

S

S

S

AST

lhs

NS

NS

S

S

S

S

NS

rhs

NS

NS

S

S

S

S

NS

result

NS

NS

S

S

S

S

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Murmur3Hash

hash

Murmur3 hash operator

None

project

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT

NS

result

S

NaNvl

nanvl

Evaluates to left iff left is not NaN, right otherwise

None

project

lhs

S

S

rhs

S

S

result

S

S

NamedLambdaVariable

A parameter to a higher order SQL function

None

project

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

Not

!, not

Boolean not operator

None

project

input

S

result

S

AST

input

S

result

S

NthValue

nth_value

nth window operator

None

window

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

offset

S

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

OctetLength

octet_length

The byte length of string data

None

project

input

S

NS

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Or

or

Logical OR

None

project

lhs

S

rhs

S

result

S

AST

lhs

S

rhs

S

result

S

PercentRank

percent_rank

Window function that returns the percent rank value within the aggregation window

None

window

ordering

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

NS

result

S

Pmod

pmod

Pmod

None

project

lhs

S

S

S

S

S

S

PS decimals with precision 38 are not supported; max DECIMAL precision of 18

rhs

S

S

S

S

S

S

NS

result

S

S

S

S

S

S

NS

PosExplode

posexplode_outer, posexplode

Given an input array produces a sequence of rows for each value in the array

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

Pow

pow, power

lhs ^ rhs

None

project

lhs

S

rhs

S

result

S

AST

lhs

S

rhs

S

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

PreciseTimestampConversion

Expression used internally to convert the TimestampType to Long and back without losing precision, i.e. in microseconds. Used in time windowing

None

project

input

S

PS UTC is only supported TZ for TIMESTAMP

result

S

PS UTC is only supported TZ for TIMESTAMP

PromotePrecision

PromotePrecision before arithmetic operations between DecimalType data

None

project

input

S

result

S

PythonUDF

UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated

None

aggregation

param

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP

reduction

param

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP

window

param

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP

project

param

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP

Quarter

quarter

Returns the quarter of the year for date, in the range 1 to 4

None

project

input

S

result

S

RLike

rlike

Regular expression version of Like

None

project

str

S

regexp

PS Literal value only

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

RaiseError

raise_error

Throw an exception

None

project

input

S

result

S

Rand

random, rand

Generate a random column with i.i.d. uniformly distributed values in [0, 1)

None

project

seed

S

S

result

S

Rank

rank

Window function that returns the rank value within the aggregation window

None

window

ordering

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

NS

result

S

RegExpExtract

regexp_extract

Extract a specific group identified by a regular expression

None

project

str

S

regexp

PS Literal value only

idx

S

result

S

RegExpExtractAll

regexp_extract_all

Extract all strings matching a regular expression corresponding to the regex group index

None

project

str

S

regexp

PS Literal value only

idx

PS Literal value only

result

S

RegExpReplace

regexp_replace

String replace using a regular expression pattern

None

project

regex

PS Literal value only

result

S

pos

PS only a value of 1 is supported

str

S

rep

PS Literal value only

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Remainder

%, mod

Remainder or modulo

None

project

lhs

S

S

S

S

S

S

S

rhs

S

S

S

S

S

S

S

result

S

S

S

S

S

S

S

ReplicateRows

Given an input row replicates the row N times

None

project

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

Reverse

reverse

Returns a reversed string or an array with reverse order of elements

None

project

input

S

PS UTC is only supported TZ for child TIMESTAMP

result

S

PS UTC is only supported TZ for child TIMESTAMP

Rint

rint

Rounds up a double value to the nearest double equal to an integer

None

project

input

S

result

S

AST

input

S

result

S

Round

round

Round an expression to d decimal places using HALF_UP rounding mode

None

project

value

S

S

S

S

PS result may round slightly differently

PS result may round slightly differently

S

scale

S

result

S

S

S

S

S

S

S

RowNumber

row_number

Window function that returns the index for the row within the aggregation window

None

window

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

ScalaUDF

User Defined Function, the UDF can choose to implement a RAPIDS accelerated interface to get better performance.

None

project

param

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

Second

second

Returns the second component of the string/timestamp

None

project

input

PS UTC is only supported TZ for TIMESTAMP

result

S

Sequence

sequence

Sequence

None

project

start

S

S

S

S

NS

NS

stop

S

S

S

S

NS

NS

step

S

S

S

S

NS

result

PS unsupported child types DATE, TIMESTAMP

ShiftLeft

shiftleft

Bitwise shift left (<<)

None

project

value

S

S

amount

S

result

S

S

ShiftRight

shiftright

Bitwise shift right (>>)

None

project

value

S

S

amount

S

result

S

S

ShiftRightUnsigned

shiftrightunsigned

Bitwise unsigned shift right (>>>)

None

project

value

S

S

amount

S

result

S

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Signum

sign, signum

Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive

None

project

input

S

result

S

Sin

sin

Sine

None

project

input

S

result

S

AST

input

S

result

S

Sinh

sinh

Hyperbolic sine

None

project

input

S

result

S

AST

input

S

result

S

Size

size, cardinality

The size of an array or a map

None

project

input

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

result

S

SortArray

sort_array

Returns a sorted array with the input array and the ascending / descending order

None

project

array

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT

ascendingOrder

S

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

SortOrder

Sort order

None

project

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

SparkPartitionID

spark_partition_id

Returns the current partition id

None

project

result

S

SpecifiedWindowFrame

Specification of the width of the group (or “frame”) of input rows around which a window function is evaluated

None

project

lower

S

S

S

S

NS

NS

S

S

upper

S

S

S

S

NS

NS

S

S

result

S

S

S

S

NS

NS

NS

S

Sqrt

sqrt

Square root

None

project

input

S

result

S

AST

input

S

result

S

StartsWith

Starts with

None

project

src

S

search

PS Literal value only

result

S

StringInstr

instr

Instr string operator

None

project

str

S

substr

PS Literal value only

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

StringLPad

lpad

Pad a string on the left

None

project

str

S

len

PS Literal value only

pad

PS Literal value only

result

S

StringLocate

position, locate

Substring search operator

None

project

substr

PS Literal value only

str

S

start

PS Literal value only

result

S

StringRPad

rpad

Pad a string on the right

None

project

str

S

len

PS Literal value only

pad

PS Literal value only

result

S

StringRepeat

repeat

StringRepeat operator that repeats the given strings with numbers of times given by repeatTimes

None

project

input

S

repeatTimes

S

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

StringReplace

replace

StringReplace operator

None

project

src

S

search

PS Literal value only

replace

PS Literal value only

result

S

StringSplit

split

Splits str around occurrences that match regex

None

project

str

S

regexp

PS very limited subset of regex supported; Literal value only

limit

PS Literal value only

result

S

StringToMap

str_to_map

Creates a map after splitting the input string into pairs of key-value strings

None

project

str

S

pairDelim

S

keyValueDelim

S

result

S

StringTrim

trim

StringTrim operator

None

project

src

S

trimStr

PS Literal value only

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

StringTrimLeft

ltrim

StringTrimLeft operator

None

project

src

S

trimStr

PS Literal value only

result

S

StringTrimRight

rtrim

StringTrimRight operator

None

project

src

S

trimStr

PS Literal value only

result

S

Substring

substr, substring

Substring operator

None

project

str

S

NS

pos

S

len

S

result

S

NS

SubstringIndex

substring_index

substring_index operator

None

project

str

S

delim

PS only a single character is allowed; Literal value only

count

PS Literal value only

result

S

Subtract

-

Subtraction

None

project

lhs

S

S

S

S

S

S

S

NS

rhs

S

S

S

S

S

S

S

NS

result

S

S

S

S

S

S

S

NS

AST

lhs

NS

NS

S

S

S

S

NS

NS

rhs

NS

NS

S

S

S

S

NS

NS

result

NS

NS

S

S

S

S

NS

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Tan

tan

Tangent

None

project

input

S

result

S

AST

input

S

result

S

Tanh

tanh

Hyperbolic tangent

None

project

input

S

result

S

AST

input

S

result

S

TimeAdd

Adds interval to timestamp

None

project

start

PS UTC is only supported TZ for TIMESTAMP

interval

PS month intervals are not supported; Literal value only

result

PS UTC is only supported TZ for TIMESTAMP

ToDegrees

degrees

Converts radians to degrees

None

project

input

S

result

S

ToRadians

radians

Converts degrees to radians

None

project

input

S

result

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

ToUnixTimestamp

to_unix_timestamp

Returns the UNIX timestamp of the given time

None

project

timeExp

S

PS UTC is only supported TZ for TIMESTAMP

S

format

PS A limited number of formats are supported; Literal value only

result

S

TransformKeys

transform_keys

Transform keys in a map using a transform function

None

project

argument

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

function

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

TransformValues

transform_values

Transform values in a map using a transform function

None

project

argument

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

function

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

UnaryMinus

negative

Negate a numeric value

None

project

input

S

S

S

S

S

S

S

NS

result

S

S

S

S

S

S

S

NS

AST

input

NS

NS

S

S

S

S

NS

NS

result

NS

NS

S

S

S

S

NS

NS

UnaryPositive

positive

A numeric value with a + in front of it

None

project

input

S

S

S

S

S

S

S

NS

result

S

S

S

S

S

S

S

NS

AST

input

S

S

S

S

S

S

NS

NS

result

S

S

S

S

S

S

NS

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

UnboundedFollowing$

Special boundary for a window frame, indicating all rows preceding the current row

None

project

result

S

UnboundedPreceding$

Special boundary for a window frame, indicating all rows preceding the current row

None

project

result

S

UnixTimestamp

unix_timestamp

Returns the UNIX timestamp of current or specified time

None

project

timeExp

S

PS UTC is only supported TZ for TIMESTAMP

S

format

PS A limited number of formats are supported; Literal value only

result

S

UnscaledValue

Convert a Decimal to an unscaled long value for some aggregation optimizations

None

project

input

PS max DECIMAL precision of 18

result

S

Upper

upper, ucase

String uppercase operator

This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly.

project

input

S

result

S

WeekDay

weekday

Returns the day of the week (0 = Monday…6=Sunday)

None

project

input

S

result

S

WindowExpression

Calculates a return value for every input row of a table based on a group (or “window”) of rows

None

window

windowFunction

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

windowSpec

S

S

S

S

NS

NS

PS max DECIMAL precision of 18

S

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

WindowSpecDefinition

Specification of a window function, indicating the partitioning-expression, the row ordering, and the width of the window

None

project

partition

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

NS

value

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Year

year

Returns the year from a date or timestamp

None

project

input

S

result

S

AggregateExpression

Aggregate expression

None

aggregation

aggFunc

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

filter

S

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

reduction

aggFunc

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

filter

S

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

window

aggFunc

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

filter

S

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

ApproximatePercentile

percentile_approx, approx_percentile

Approximate percentile

This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark

aggregation

input

S

S

S

S

S

S

NS

NS

S

percentage

S

S

accuracy

S

result

S

S

S

S

S

S

NS

NS

S

PS unsupported child types DATE, TIMESTAMP

reduction

input

S

S

S

S

S

S

NS

NS

S

percentage

S

S

accuracy

S

result

S

S

S

S

S

S

NS

NS

S

PS unsupported child types DATE, TIMESTAMP

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Average

avg, mean

Average aggregate operator

None

aggregation

input

S

S

S

S

S

S

S

result

S

S

reduction

input

S

S

S

S

S

S

S

result

S

S

window

input

S

S

S

S

S

S

S

result

S

S

CollectList

collect_list

Collect a list of non-unique elements, not supported in reduction

None

aggregation

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

reduction

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

window

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

CollectSet

collect_set

Collect a set of unique elements, not supported in reduction

None

aggregation

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

reduction

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

window

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

NS

result

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Count

count

Count aggregate operator

None

aggregation

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

S

result

S

reduction

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

S

result

S

window

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

PS UTC is only supported TZ for child TIMESTAMP

S

result

S

First

first_value, first

first aggregate operator

None

aggregation

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

reduction

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

window

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

Last

last, last_value

last aggregate operator

None

aggregation

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

reduction

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

window

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Max

max

Max aggregate operator

None

aggregation

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT

NS

reduction

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT

NS

window

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

Min

min

Min aggregate operator

None

aggregation

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT

NS

reduction

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT

NS

window

input

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

PivotFirst

PivotFirst operator

None

aggregation

pivotColumn

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

NS

valueColumn

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

NS

NS

NS

reduction

pivotColumn

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

NS

valueColumn

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT

NS

NS

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

StddevPop

stddev_pop

Aggregation computing population standard deviation

None

reduction

input

NS

result

NS

aggregation

input

S

result

S

window

input

NS

result

NS

StddevSamp

stddev_samp, std, stddev

Aggregation computing sample standard deviation

None

reduction

input

NS

result

NS

aggregation

input

S

result

S

window

input

S

result

S

Sum

sum

Sum aggregate operator

None

aggregation

input

S

S

S

S

S

S

S

result

S

S

S

reduction

input

S

S

S

S

S

S

S

result

S

S

S

window

input

S

S

S

S

S

S

S

result

S

S

S

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

VariancePop

var_pop

Aggregation computing population variance

None

reduction

input

NS

result

NS

aggregation

input

S

result

S

window

input

NS

result

NS

VarianceSamp

var_samp, variance

Aggregation computing sample variance

None

reduction

input

NS

result

NS

aggregation

input

S

result

S

window

input

NS

result

NS

NormalizeNaNAndZero

Normalize NaN and zero

None

project

input

S

S

result

S

S

ScalarSubquery

Subquery that will return only one row and one column

None

project

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT

NS

Expression

SQL Functions(s)

Description

Notes

Context

Param/Output

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

HiveGenericUDF

Hive Generic UDF, the UDF can choose to implement a RAPIDS accelerated interface to get better performance

None

project

param

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

HiveSimpleUDF

Hive UDF, the UDF can choose to implement a RAPIDS accelerated interface to get better performance

None

project

param

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

result

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

Casting#

The above table does not show what is and is not supported for cast. This table shows the matrix of supported casts. Nested types like MAP, Struct, and Array can only be cast if the child types can be cast.

Some of the casts to/from string on the GPU are not 100% the same and are disabled by default. Please see the configs for more details on these specific cases.

Please note that even though casting from one type to another is supported by Spark it does not mean they all produce usable results. For example casting from a date to a boolean always produces a null. This is for Hive compatibility and the accelerator produces the same result.

AnsiCast

TO

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

FROM

BOOLEAN

S

S

S

S

S

S

S

S

S

BYTE

S

S

S

S

S

S

S

S

S

SHORT

S

S

S

S

S

S

S

S

S

INT

S

S

S

S

S

S

S

S

S

LONG

S

S

S

S

S

S

S

S

S

FLOAT

S

S

S

S

S

S

S

PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.

S

DOUBLE

S

S

S

S

S

S

S

PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.

S

DATE

S

PS UTC is only supported TZ for TIMESTAMP

S

TIMESTAMP

S

PS UTC is only supported TZ for TIMESTAMP

S

STRING

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

DECIMAL

NS

S

S

S

S

S

S

S

S

NULL

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

NS

BINARY

S

S

CALENDAR

NS

NS

ARRAY

PS The array’s child type must also support being cast to the desired child type; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT

MAP

PS the map’s key and value must also support being cast to the desired child types; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

STRUCT

PS the struct’s children must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT

UDT

NS

Cast

TO

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

FROM

BOOLEAN

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

BYTE

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

SHORT

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

INT

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

LONG

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

FLOAT

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.

S

DOUBLE

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.

S

DATE

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

TIMESTAMP

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

NS

STRING

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

DECIMAL

NS

S

S

S

S

S

S

NS

S

S

NULL

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

NS

NS

BINARY

S

S

CALENDAR

NS

NS

ARRAY

PS the array’s child type must also support being cast to string

PS The array’s child type must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

MAP

PS the map’s key and value must also support being cast to string

PS the map’s key and value must also support being cast to the desired child types; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

STRUCT

PS the struct’s children must also support being cast to string

PS the struct’s children must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT

UDT

NS

NS

Partitioning#

When transferring data between different tasks the data is partitioned in specific ways depending on requirements in the plan. Be aware that the types included below are only for rows that impact where the data is partitioned. So for example if we are doing a join on the column a the data would be hash partitioned on a, but all of the other columns in the same data frame as a don’t show up in the table. They are controlled by the rules for ShuffleExchangeExec which uses the Partitioning.

Partition

Description

Notes

Param

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

HashPartitioning

Hash based partitioning

None

hash_key

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT

NS

RangePartitioning

Range partitioning

None

order_key

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

NS

NS

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT

NS

RoundRobinPartitioning

Round robin partitioning

None

SinglePartition$

Single partitioning

None

Input/Output#

For Input and Output it is not cleanly exposed what types are supported and which are not. This table tries to clarify that. Be aware that some types may be disabled in some cases for either reads or writes because of processing limitations, like rebasing dates or timestamps, or for a lack of type coercion support.

Format

Direction

BOOLEAN

BYTE

SHORT

INT

LONG

FLOAT

DOUBLE

DATE

TIMESTAMP

STRING

DECIMAL

NULL

BINARY

CALENDAR

ARRAY

MAP

STRUCT

UDT

Avro

Read

S

S

S

S

S

S

S

NS

NS

S

NS

NS

NS

NS

NS

NS

Write

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

CSV

Read

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

Write

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

Delta

Read

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

Write

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

HiveText

Read

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

NS

NS

NS

NS

NS

NS

Write

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

NS

NS

NS

NS

NS

NS

Iceberg

Read

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

Write

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

JSON

Read

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

NS

NS

NS

NS

Write

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

NS

ORC

Read

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT

NS

Write

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

NS

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT

NS

Parquet

Read

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

Write

S

S

S

S

S

S

S

S

PS UTC is only supported TZ for TIMESTAMP

S

S

S

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT

NS

Apache Iceberg Support#

Support for Apache Iceberg has additional limitations. See the Apache Iceberg Support document.