Supported Operators
Apache Spark supports processing various types of data. Not all expressions support all data types. The RAPIDS Accelerator for Apache Spark has further restrictions on what types are supported for processing. This tries to document what operations are supported and what data types each operation supports. Because Apache Spark is under active development too and this document was generated against version 3.1.1 of Spark. Most of this should still apply to other versions of Spark, but there may be slight changes.
Decimal
The Decimal
type in Spark supports a precision up to 38 digits (128-bits).T he RAPIDS Accelerator supports 128-bit starting from version 21.12 and decimals are enabled by default. Please check Decimal Support for more details.
Decimal
precision and scale follow the same rule as CPU mode in Apache Spark:
* In particular, if we have expressions e1 and e2 with precision/scale p1/s1 and p2/s2
* respectively, then the following operations have the following precision / scale:
*
* Operation Result Precision Result Scale
* ------------------------------------------------------------------------
* e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2)
* e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2)
* e1 * e2 p1 + p2 + 1 s1 + s2
* e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1)
* e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2)
* e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2)
However, Spark inserts PromotePrecision
to CAST both sides to the same type. GPU mode may fall back to CPU even if the result Decimal precision is within 18 digits. For example, Decimal(8,2)
x Decimal(6,3)
resulting in Decimal (15,5)
runs on CPU, because due to PromotePrecision
, GPU mode assumes the result is Decimal(19,6)
. There are even extreme cases where Spark can temporarily return a Decimal value larger than what can be stored in 128-bits and then uses the CheckOverflow
operator to round it to a desired precision and scale. This means that even when the accelerator supports 128-bit decimal, we might not be able to support all operations that Spark can support.
Timestamp
Timestamps in Spark will all be converted to the local time zone before processing and are often converted to UTC before being stored, like in Parquet or ORC. The RAPIDS Accelerator only supports UTC as the time zone for timestamps.
CalenderInterval
In Spark CalendarInterval
s store three values, months, days, and microseconds. Support for this type is still very limited in the accelerator. In some cases only a a subset of the type is supported, like window ranges only support days currently.
Configuration
There are lots of different configuration values that can impact if an operation is supported or not. Some of these are a part of the RAPIDS Accelerator and cover the level of compatibility with Apache Spark. Those are covered here. Others are a part of Apache Spark itself and those are a bit harder to document. The work of updating this to cover that support is still ongoing.
In general though if you ever have any question about why an operation is not running on the GPU you may set spark.rapids.sql.explain
to ALL and it will try to give all of the reasons why this particular operator or expression is on the CPU or GPU.
Types
Type Name |
Type Description |
---|---|
BOOLEAN | Holds true or false values. |
BYTE | Signed 8-bit integer value. |
SHORT | Signed 16-bit integer value. |
INT | Signed 32-bit integer value. |
LONG | Signed 64-bit integer value. |
FLOAT | 32-bit floating point value. |
DOUBLE | 64-bit floating point value. |
DATE | A date with no time component. Stored as 32-bit integer with days since Jan 1, 1970. |
TIMESTAMP | A date and time. Stored as 64-bit integer with microseconds since Jan 1, 1970 in the current time zone. |
STRING | A text string. Stored as UTF-8 encoded bytes. |
DECIMAL | A fixed point decimal value with configurable precision and scale. |
NULL | Only stores null values and is typically only used when no other type can be determined from the SQL. |
BINARY | An array of non-nullable bytes. |
CALENDAR | Represents a period of time. Stored as months, days and microseconds. |
ARRAY | A sequence of elements. |
MAP | A set of key value pairs, the keys cannot be null. |
STRUCT | A series of named fields. |
UDT | User defined types and java Objects. These are not standard SQL types. |
Support
Value |
Description |
---|---|
S | (Supported) Both Apache Spark and the RAPIDS Accelerator support this type fully. |
(Not Applicable) Neither Spark not the RAPIDS Accelerator support this type in this situation. | |
PS | (Partial Support) Apache Spark supports this type, but the RAPIDS Accelerator only partially supports it. An explanation for what is missing will be included with this. |
NS | (Not Supported) Apache Spark supports this type but the RAPIDS Accelerator does not. |
Apache Spark uses a Directed Acyclic Graph(DAG) of processing to build a query. The nodes in this graph are instances of SparkPlan
and represent various high level operations like doing a filter or project. The operations that the RAPIDS Accelerator supports are described below.
Executor |
Description |
Notes |
Param(s) |
BOOLEAN |
BYTE |
SHORT |
INT |
LONG |
FLOAT |
DOUBLE |
DATE |
TIMESTAMP |
STRING |
DECIMAL |
NULL |
BINARY |
CALENDAR |
ARRAY |
MAP |
STRUCT |
UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CoalesceExec | The backend for the dataframe coalesce method | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
CollectLimitExec | Reduce to single partition and apply limit | This is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
ExpandExec | The backend for the expand operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
FileSourceScanExec | Reading data from files, often from Hive tables | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
FilterExec | The backend for most filter statements | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
GenerateExec | The backend for operations that generate more output rows than input rows like explode | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
GlobalLimitExec | Limiting of results across partitions | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
LocalLimitExec | Per-partition limiting of results | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
ProjectExec | The backend for most select, withColumn and dropColumn statements | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
RangeExec | The backend for range operator | None | Input/Output | S | |||||||||||||||||
SampleExec | The backend for the sample operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
SortExec | The backend for the sort operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
SubqueryBroadcastExec | Plan to collect and transform the broadcast key values | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | S |
TakeOrderedAndProjectExec | Take the first limit elements as defined by the sortOrder, and do projection if needed | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
UnionExec | The backend for the union operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS unionByName will not optionally impute nulls for missing struct fields when the column is a struct and there are non-overlapping fields; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
Executor | Description | Notes | Param(s) | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
CustomShuffleReaderExec | A wrapper of shuffle query stage | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
HashAggregateExec | The backend for hash based aggregations | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
ObjectHashAggregateExec | The backend for hash based aggregations supporting TypedImperativeAggregate functions | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS only allowed when aggregate buffers can be converted between CPU and GPU | NS | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
SortAggregateExec | The backend for sort based aggregations | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS only allowed when aggregate buffers can be converted between CPU and GPU | NS | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
InMemoryTableScanExec | Implementation of InMemoryTableScanExec to use GPU accelerated caching | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT | NS |
DataWritingCommandExec | Writing data | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS 128bit decimal only supported for Orc and Parquet | NS | PS Only supported for Parquet | NS | PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT | PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT | PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT | NS |
ExecutedCommandExec | Eagerly executed commands | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | S |
BatchScanExec | The backend for most file input | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, CALENDAR, UDT | NS |
BroadcastExchangeExec | The backend for broadcast exchange of data | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | NS |
ShuffleExchangeExec | The backend for most data being exchanged between processes | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS Round-robin partitioning is not supported for nested structs if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
BroadcastHashJoinExec | Implementation of join using broadcast data | None | leftKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rightKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||
condition | S | ||||||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||
BroadcastNestedLoopJoinExec | Implementation of join using brute force. Full outer joins and joins where the broadcast side matches the join side (e.g.: LeftOuter with left broadcast) are not supported | None | condition (A non-inner join only is supported if the condition expression can be converted to a GPU AST expression) | S | |||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||
Executor | Description | Notes | Param(s) | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
CartesianProductExec | Implementation of join using brute force | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
ShuffledHashJoinExec | Implementation of join using hashed shuffled data | None | leftKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rightKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||
condition | S | ||||||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||
SortMergeJoinExec | Sort merge join, replacing with shuffled hash join | None | leftKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rightKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||
condition | S | ||||||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||
AggregateInPandasExec | The backend for an Aggregation Pandas UDF, this accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | NS | NS | NS | NS |
ArrowEvalPythonExec | The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS |
FlatMapCoGroupsInPandasExec | The backend for CoGrouped Aggregation Pandas UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. | This is disabled by default because Performance is not ideal with many small groups | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | NS | NS | NS | NS |
FlatMapGroupsInPandasExec | The backend for Flat Map Groups Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | NS | NS | NS | NS |
MapInPandasExec | The backend for Map Pandas Iterator UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS |
WindowInPandasExec | The backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame. | This is disabled by default because it only supports row based frame for now | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | NS | NS |
Executor | Description | Notes | Param(s) | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
WindowExec | Window-operator backend | None | partitionSpec | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS |
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||
HiveTableScanExec | Scan Exec to read Hive delimited text tables | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | NS | NS | NS | NS | NS | NS |
Inside each node in the DAG there can be one or more trees of expressions that describe various types of processing that happens in that part of the plan. These can be things like adding two numbers together or checking for null. These expressions can have multiple input parameters and one output value. These expressions also can happen in different contexts. Because of how the accelerator works different contexts have different levels of support.
The most common expression context is project
. In this context values from a single input row go through the expression and the result will also be use to produce something in the same row. Be aware that even in the case of aggregation and window operations most of the processing is still done in the project context either before or after the other processing happens.
Aggregation operations like count or sum can take place in either the aggregation
, reduction
, or window
context. aggregation
is when the operation was done while grouping the data by one or more keys. reduction
is when there is no group by and there is a single result for an entire column. window
is for window operations.
The final expression context is AST
or Abstract Syntax Tree. Before explaining AST we first need to explain in detail how project context operations work. Generally for a project context operation the plan Spark developed is read on the CPU and an appropriate set of GPU kernels are selected to do those operations. For example a >= b + 1
. Would result in calling a GPU kernel to add 1
to b
, followed by another kernel that is called to compare a
to that result. The interpretation is happening on the CPU, and the GPU is used to do the processing. For AST the interpretation for some reason cannot happen on the CPU and instead must be done in the GPU kernel itself. An example of this is conditional joins. If you want to join on A.a >= B.b + 1
where A
and B
are separate tables or data frames, the +
and >=
operations cannot run as separate independent kernels because it is done on a combination of rows in both A
and B
. Instead part of the plan that Spark developed is turned into an abstract syntax tree and sent to the GPU where it can be interpreted. The number and types of operations supported in this are limited.
SQL Functions(s) |
Description |
Notes |
Context |
Param/Output |
BOOLEAN |
BYTE |
SHORT |
INT |
LONG |
FLOAT |
DOUBLE |
DATE |
TIMESTAMP |
STRING |
DECIMAL |
NULL |
BINARY |
CALENDAR |
ARRAY |
MAP |
STRUCT |
UDT |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abs | abs | Absolute value | None | project | input | S | S | S | S | S | S | S | |||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
AST | input | NS | NS | S | S | S | S | NS | |||||||||||||||
result | NS | NS | S | S | S | S | NS | ||||||||||||||||
Acos | acos | Inverse cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Acosh | acosh | Inverse hyperbolic cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Add | + | Addition | None | project | lhs | S | S | S | S | S | S | S | NS | ||||||||||
rhs | S | S | S | S | S | S | S | NS | |||||||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | lhs | NS | NS | S | S | S | S | NS | NS | ||||||||||||||
rhs | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
result | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Alias | Gives a column a name | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
AST | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
And | and | Logical AND | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | |||||||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
ArrayContains | array_contains | Returns a boolean if the array contains the passed in key | None | project | array | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | |||||||||||||||||
key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | S | ||||||||||||||||||||||
ArrayExcept | array_except | Returns an array of the elements in array1 but not in array2, without duplicates | This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ | project | array1 | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | |||||||||||||||||
array2 | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | ||||||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
ArrayExists | exists | Return true if any element satisfies the predicate LambdaFunction | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
function | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
ArrayIntersect | array_intersect | Returns an array of the elements in the intersection of array1 and array2, without duplicates | This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ | project | array1 | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | |||||||||||||||||
array2 | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | ||||||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | ||||||||||||||||||||||
ArrayMax | array_max | Returns the maximum value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | |||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
ArrayMin | array_min | Returns the minimum value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | |||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
ArrayRemove | array_remove | Returns the array after removing all elements that equal to the input element (right) from the input array (left) | None | project | array | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | NS | NS |
element | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
ArrayRepeat | array_repeat | Returns the array containing the given input value (left) count (right) times | None | project | left | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
right | S | S | S | S | |||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
ArrayTransform | transform | Transform elements in an array using the transform function. This is similar to a map in functional programming | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
ArrayUnion | array_union | Returns an array of the elements in the union of array1 and array2, without duplicates. | This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ | project | array1 | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | |||||||||||||||||
array2 | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | ||||||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | ||||||||||||||||||||||
ArraysOverlap | arrays_overlap | Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise. | This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal, but the CPU implementation currently does not (see SPARK-39845). Also, Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ | project | array1 | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | |||||||||||||||||
array2 | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
ArraysZip | arrays_zip | Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays. | None | project | children | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||||||
Asin | asin | Inverse sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Asinh | asinh | Inverse hyperbolic sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
AtLeastNNonNulls | Checks if number of non null/Nan values is greater than a given value | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |
result | S | ||||||||||||||||||||||
Atan | atan | Inverse tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Atanh | atanh | Inverse hyperbolic tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
AttributeReference | References an input column | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |
AST | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
BRound | bround | Round an expression to d decimal places using HALF_EVEN rounding mode | None | project | value | S | S | S | S | PS result may round slightly differently | PS result may round slightly differently | S | |||||||||||
scale | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
BitLength | bit_length | The bit length of string data | None | project | input | S | NS | ||||||||||||||||
result | S | ||||||||||||||||||||||
BitwiseAnd | & | Returns the bitwise AND of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | lhs | NS | NS | S | S | ||||||||||||||||||
rhs | NS | NS | S | S | |||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
BitwiseNot | ~ | Returns the bitwise NOT of the operands | None | project | input | S | S | S | S | ||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | input | NS | NS | S | S | ||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
BitwiseOr | | | Returns the bitwise OR of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | lhs | NS | NS | S | S | ||||||||||||||||||
rhs | NS | NS | S | S | |||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
BitwiseXor | ^ | Returns the bitwise XOR of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | lhs | NS | NS | S | S | ||||||||||||||||||
rhs | NS | NS | S | S | |||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
CaseWhen | when | CASE WHEN expression | None | project | predicate | S | |||||||||||||||||
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Cbrt | cbrt | Cube root | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Ceil | ceiling, ceil | Ceiling of a number | None | project | input | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
CheckOverflow | CheckOverflow after arithmetic operations between DecimalType data | None | project | input | S | ||||||||||||||||||
result | S | ||||||||||||||||||||||
Coalesce | coalesce | Returns the first non-null argument if exists. Otherwise, null | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
Concat | concat | List/String concatenate | None | project | input | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||
result | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||||
ConcatWs | concat_ws | Concatenates multiple input strings or array of strings into a single string using a given separator | None | project | input | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
Contains | Contains | None | project | src | S | ||||||||||||||||||
search | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Cos | cos | Cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Cosh | cosh | Hyperbolic cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Cot | cot | Cotangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
CreateArray | array | Returns an array with the given elements | None | project | arg | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT | NS |
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT | ||||||||||||||||||||||
CreateMap | map | Create a map | None | project | key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | ||||
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | ||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
CreateNamedStruct | named_struct, struct | Creates a struct with the given field names and values | None | project | name | S | |||||||||||||||||
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||||||
CurrentRow$ | Special boundary for a window frame, indicating stopping at the current row | None | project | result | S | ||||||||||||||||||
DateAdd | date_add | Returns the date that is num_days after start_date | None | project | startDate | S | |||||||||||||||||
days | S | S | S | ||||||||||||||||||||
result | S | ||||||||||||||||||||||
DateAddInterval | Adds interval to date | None | project | start | S | ||||||||||||||||||
interval | PS month intervals are not supported; Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
DateDiff | datediff | Returns the number of days from startDate to endDate | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
DateFormatClass | date_format | Converts timestamp to a value of string in the format specified by the date format | None | project | timestamp | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
strfmt | PS A limited number of formats are supported; Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
DateSub | date_sub | Returns the date that is num_days before start_date | None | project | startDate | S | |||||||||||||||||
days | S | S | S | ||||||||||||||||||||
result | S | ||||||||||||||||||||||
DayOfMonth | dayofmonth, day | Returns the day of the month from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
DayOfWeek | dayofweek | Returns the day of the week (1 = Sunday…7=Saturday) | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
DayOfYear | dayofyear | Returns the day of the year from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
DenseRank | dense_rank | Window function that returns the dense rank value within the aggregation window | None | window | ordering | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS |
result | S | ||||||||||||||||||||||
Divide | / | Division | None | project | lhs | S | S | ||||||||||||||||
rhs | S | S | |||||||||||||||||||||
result | S | S | |||||||||||||||||||||
ElementAt | element_at | Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is map. | None | project | array/map | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS If it’s map, only primitive key types are supported.; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||
index/key | PS Unsupported as array index. | PS Unsupported as array index. | PS Unsupported as array index. | S | PS Unsupported as array index. | PS Unsupported as array index. | PS Unsupported as array index. | PS Unsupported as array index. | PS Unsupported as array index.; UTC is only supported TZ for TIMESTAMP | PS Unsupported as array index. | PS Unsupported as array index. | NS | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
EndsWith | Ends with | None | project | src | S | ||||||||||||||||||
search | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
EqualNullSafe | <=> | Check if the values are equal including nulls <=> | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||||
result | S | ||||||||||||||||||||||
EqualTo | =, == | Check if the values are equal | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Exp | exp | Euler’s number e raised to a power | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Explode | explode, explode_outer | Given an input array produces a sequence of rows for each value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||||||
Expm1 | expm1 | Euler’s number e raised to a power minus 1 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Floor | floor | Floor of a number | None | project | input | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
FromUTCTimestamp | from_utc_timestamp | Render the input UTC timestamp in the input timezone | None | project | timestamp | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
timezone | PS Only timezones equivalent to UTC are supported | ||||||||||||||||||||||
result | PS UTC is only supported TZ for TIMESTAMP | ||||||||||||||||||||||
FromUnixTime | from_unixtime | Get the string from a unix timestamp | None | project | sec | S | |||||||||||||||||
format | PS Only a limited number of formats are supported; Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
GetArrayItem | Gets the field at ordinal in the Array | None | project | array | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||
ordinal | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
GetArrayStructFields | Extracts the ordinal-th fields of all array elements for the data with the type of array of struct | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
GetJsonObject | get_json_object | Extracts a json object from path | None | project | json | S | |||||||||||||||||
path | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
GetMapValue | Gets Value from a Map based on a key | None | project | map | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||
key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
GetStructField | Gets the named field of the struct | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
GetTimestamp | Gets timestamps from strings using given pattern. | None | project | timeExp | S | PS UTC is only supported TZ for TIMESTAMP | S | ||||||||||||||||
format | PS A limited number of formats are supported; Literal value only | ||||||||||||||||||||||
result | PS UTC is only supported TZ for TIMESTAMP | ||||||||||||||||||||||
GreaterThan | > | > operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
GreaterThanOrEqual | >= | >= operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Greatest | greatest | Returns the greatest value of all parameters, skipping null values | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
Hour | hour | Returns the hour component of the string/timestamp | None | project | input | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
result | S | ||||||||||||||||||||||
Hypot | hypot | Pythagorean addition (Hypotenuse) of real numbers | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
If | if | IF expression | None | project | predicate | S | |||||||||||||||||
trueValue | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
falseValue | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
In | in | IN operator | None | project | value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
list | PS Literal value only | PS Literal value only | PS Literal value only | PS Literal value only | PS Literal value only | PS Literal value only | PS Literal value only | PS Literal value only | PS UTC is only supported TZ for TIMESTAMP; Literal value only | PS Literal value only | PS Literal value only | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
InSet | INSET operator | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||
result | S | ||||||||||||||||||||||
InitCap | initcap | Returns str with the first letter of each word in uppercase. All other letters are in lowercase | This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
InputFileBlockLength | input_file_block_length | Returns the length of the block being read, or -1 if not available | None | project | result | S | |||||||||||||||||
InputFileBlockStart | input_file_block_start | Returns the start offset of the block being read, or -1 if not available | None | project | result | S | |||||||||||||||||
InputFileName | input_file_name | Returns the name of the file being read, or empty string if not available | None | project | result | S | |||||||||||||||||
IntegralDivide | div | Division with a integer result | None | project | lhs | S | S | ||||||||||||||||
rhs | S | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
IsNaN | isnan | Checks if a value is NaN | None | project | input | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
IsNotNull | isnotnull | Checks if a value is not null | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
result | S | ||||||||||||||||||||||
IsNull | isnull | Checks if a value is null | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
result | S | ||||||||||||||||||||||
JsonToStructs | from_json | Returns a struct value with the given jsonStr and schema | This is disabled by default because parsing JSON from a column has a large number of issues and should be considered beta quality right now. | project | jsonStr | S | |||||||||||||||||
result | NS | PS unsupported child types BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | ||||||||||||||||||||
JsonTuple | json_tuple | Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string. | None | project | json | S | |||||||||||||||||
field | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
KnownFloatingPointNormalized | Tag to prevent redundant normalization | None | project | input | S | S | |||||||||||||||||
result | S | S | |||||||||||||||||||||
KnownNotNull | Tag an expression as known to not be null | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | NS | |||||
Lag | lag | Window function that returns N entries behind this one | None | window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS |
offset | S | ||||||||||||||||||||||
default | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
LambdaFunction | Holds a higher order SQL function | None | project | function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
arguments | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
LastDay | last_day | Returns the last day of the month which the date belongs to | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Lead | lead | Window function that returns N entries ahead of this one | None | window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS |
offset | S | ||||||||||||||||||||||
default | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | |||||
Least | least | Returns the least value of all parameters, skipping null values | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
Length | length, character_length, char_length | String character length or binary byte length | None | project | input | S | NS | ||||||||||||||||
result | S | ||||||||||||||||||||||
LessThan | < | < operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
LessThanOrEqual | <= | <= operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Like | like | Like | None | project | src | S | |||||||||||||||||
search | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Literal | Holds a static value from the query | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS | |
AST | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
Log | ln | Natural log | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Log10 | log10 | Log base 10 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Log1p | log1p | Natural log 1 + expr | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Log2 | log2 | Log base 2 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Logarithm | log | Log variable base | None | project | value | S | |||||||||||||||||
base | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Lower | lower, lcase | String lowercase operator | This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
MakeDecimal | Create a Decimal from an unscaled long value for some aggregation optimizations | None | project | input | S | ||||||||||||||||||
result | PS max DECIMAL precision of 18 | ||||||||||||||||||||||
MapConcat | map_concat | Returns the union of all the given maps | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
MapEntries | map_entries | Returns an unordered array of all entries in the given map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
MapFilter | map_filter | Filters entries in a map using the function | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
function | S | ||||||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
MapKeys | map_keys | Returns an unordered array containing the keys of the map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||||||
MapValues | map_values | Returns an unordered array containing the values of the map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||||||
Md5 | md5 | MD5 hash operator | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Minute | minute | Returns the minute component of the string/timestamp | None | project | input | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
result | S | ||||||||||||||||||||||
MonotonicallyIncreasingID | monotonically_increasing_id | Returns monotonically increasing 64-bit integers | None | project | result | S | |||||||||||||||||
Month | month | Returns the month from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Multiply | * | Multiplication | None | project | lhs | S | S | S | S | S | S | S | |||||||||||
rhs | S | S | S | S | S | S | S | ||||||||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
AST | lhs | NS | NS | S | S | S | S | NS | |||||||||||||||
rhs | NS | NS | S | S | S | S | NS | ||||||||||||||||
result | NS | NS | S | S | S | S | NS | ||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Murmur3Hash | hash | Murmur3 hash operator | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | NS |
result | S | ||||||||||||||||||||||
NaNvl | nanvl | Evaluates to left iff left is not NaN, right otherwise | None | project | lhs | S | S | ||||||||||||||||
rhs | S | S | |||||||||||||||||||||
result | S | S | |||||||||||||||||||||
NamedLambdaVariable | A parameter to a higher order SQL function | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
Not | !, not | Boolean not operator | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
NthValue | nth_value | nth window operator | None | window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
offset | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
OctetLength | octet_length | The byte length of string data | None | project | input | S | NS | ||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Or | or | Logical OR | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | |||||||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
PercentRank | percent_rank | Window function that returns the percent rank value within the aggregation window | None | window | ordering | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS |
result | S | ||||||||||||||||||||||
Pmod | pmod | Pmod | None | project | lhs | S | S | S | S | S | S | PS decimals with precision 38 are not supported; max DECIMAL precision of 18 | |||||||||||
rhs | S | S | S | S | S | S | NS | ||||||||||||||||
result | S | S | S | S | S | S | NS | ||||||||||||||||
PosExplode | posexplode_outer, posexplode | Given an input array produces a sequence of rows for each value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||||||
Pow | pow, power | lhs ^ rhs | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | |||||||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
PreciseTimestampConversion | Expression used internally to convert the TimestampType to Long and back without losing precision, i.e. in microseconds. Used in time windowing | None | project | input | S | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
result | S | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||||||
PromotePrecision | PromotePrecision before arithmetic operations between DecimalType data | None | project | input | S | ||||||||||||||||||
result | S | ||||||||||||||||||||||
PythonUDF | UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated | None | aggregation | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | |||||||
reduction | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | |||||||
window | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | |||||||
project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | |||||||
Quarter | quarter | Returns the quarter of the year for date, in the range 1 to 4 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
RLike | rlike | Regular expression version of Like | None | project | str | S | |||||||||||||||||
regexp | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
RaiseError | raise_error | Throw an exception | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Rand | random, rand | Generate a random column with i.i.d. uniformly distributed values in [0, 1) | None | project | seed | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
Rank | rank | Window function that returns the rank value within the aggregation window | None | window | ordering | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS |
result | S | ||||||||||||||||||||||
RegExpExtract | regexp_extract | Extract a specific group identified by a regular expression | None | project | str | S | |||||||||||||||||
regexp | PS Literal value only | ||||||||||||||||||||||
idx | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
RegExpExtractAll | regexp_extract_all | Extract all strings matching a regular expression corresponding to the regex group index | None | project | str | S | |||||||||||||||||
regexp | PS Literal value only | ||||||||||||||||||||||
idx | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
RegExpReplace | regexp_replace | String replace using a regular expression pattern | None | project | regex | PS Literal value only | |||||||||||||||||
result | S | ||||||||||||||||||||||
pos | PS only a value of 1 is supported | ||||||||||||||||||||||
str | S | ||||||||||||||||||||||
rep | PS Literal value only | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Remainder | %, mod | Remainder or modulo | None | project | lhs | S | S | S | S | S | S | S | |||||||||||
rhs | S | S | S | S | S | S | S | ||||||||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
ReplicateRows | Given an input row replicates the row N times | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | |
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | ||||||||||||||||||||||
Reverse | reverse | Returns a reversed string or an array with reverse order of elements | None | project | input | S | PS UTC is only supported TZ for child TIMESTAMP | ||||||||||||||||
result | S | PS UTC is only supported TZ for child TIMESTAMP | |||||||||||||||||||||
Rint | rint | Rounds up a double value to the nearest double equal to an integer | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Round | round | Round an expression to d decimal places using HALF_UP rounding mode | None | project | value | S | S | S | S | PS result may round slightly differently | PS result may round slightly differently | S | |||||||||||
scale | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
RowNumber | row_number | Window function that returns the index for the row within the aggregation window | None | window | result | S | |||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
ScalaUDF | User Defined Function, the UDF can choose to implement a RAPIDS accelerated interface to get better performance. | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |||||
Second | second | Returns the second component of the string/timestamp | None | project | input | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
result | S | ||||||||||||||||||||||
Sequence | sequence | Sequence | None | project | start | S | S | S | S | NS | NS | ||||||||||||
stop | S | S | S | S | NS | NS | |||||||||||||||||
step | S | S | S | S | NS | ||||||||||||||||||
result | PS unsupported child types DATE, TIMESTAMP | ||||||||||||||||||||||
ShiftLeft | shiftleft | Bitwise shift left (<<) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
ShiftRight | shiftright | Bitwise shift right (>>) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
ShiftRightUnsigned | shiftrightunsigned | Bitwise unsigned shift right (>>>) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Signum | sign, signum | Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Sin | sin | Sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Sinh | sinh | Hyperbolic sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Size | size, cardinality | The size of an array or a map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||
result | S | ||||||||||||||||||||||
SortArray | sort_array | Returns a sorted array with the input array and the ascending / descending order | None | project | array | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | |||||||||||||||||
ascendingOrder | S | ||||||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
SortOrder | Sort order | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||||
SparkPartitionID | spark_partition_id | Returns the current partition id | None | project | result | S | |||||||||||||||||
SpecifiedWindowFrame | Specification of the width of the group (or “frame”) of input rows around which a window function is evaluated | None | project | lower | S | S | S | S | NS | NS | S | S | |||||||||||
upper | S | S | S | S | NS | NS | S | S | |||||||||||||||
result | S | S | S | S | NS | NS | NS | S | |||||||||||||||
Sqrt | sqrt | Square root | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
StartsWith | Starts with | None | project | src | S | ||||||||||||||||||
search | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringInstr | instr | Instr string operator | None | project | str | S | |||||||||||||||||
substr | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
StringLPad | lpad | Pad a string on the left | None | project | str | S | |||||||||||||||||
len | PS Literal value only | ||||||||||||||||||||||
pad | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringLocate | position, locate | Substring search operator | None | project | substr | PS Literal value only | |||||||||||||||||
str | S | ||||||||||||||||||||||
start | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringRPad | rpad | Pad a string on the right | None | project | str | S | |||||||||||||||||
len | PS Literal value only | ||||||||||||||||||||||
pad | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringRepeat | repeat | StringRepeat operator that repeats the given strings with numbers of times given by repeatTimes | None | project | input | S | |||||||||||||||||
repeatTimes | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
StringReplace | replace | StringReplace operator | None | project | src | S | |||||||||||||||||
search | PS Literal value only | ||||||||||||||||||||||
replace | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringSplit | split | Splits str around occurrences that match regex | None | project | str | S | |||||||||||||||||
regexp | PS very limited subset of regex supported; Literal value only | ||||||||||||||||||||||
limit | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringToMap | str_to_map | Creates a map after splitting the input string into pairs of key-value strings | None | project | str | S | |||||||||||||||||
pairDelim | S | ||||||||||||||||||||||
keyValueDelim | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringTrim | trim | StringTrim operator | None | project | src | S | |||||||||||||||||
trimStr | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
StringTrimLeft | ltrim | StringTrimLeft operator | None | project | src | S | |||||||||||||||||
trimStr | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringTrimRight | rtrim | StringTrimRight operator | None | project | src | S | |||||||||||||||||
trimStr | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Substring | substr, substring | Substring operator | None | project | str | S | NS | ||||||||||||||||
pos | S | ||||||||||||||||||||||
len | S | ||||||||||||||||||||||
result | S | NS | |||||||||||||||||||||
SubstringIndex | substring_index | substring_index operator | None | project | str | S | |||||||||||||||||
delim | PS only a single character is allowed; Literal value only | ||||||||||||||||||||||
count | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Subtract | - | Subtraction | None | project | lhs | S | S | S | S | S | S | S | NS | ||||||||||
rhs | S | S | S | S | S | S | S | NS | |||||||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | lhs | NS | NS | S | S | S | S | NS | NS | ||||||||||||||
rhs | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
result | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Tan | tan | Tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Tanh | tanh | Hyperbolic tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
TimeAdd | Adds interval to timestamp | None | project | start | PS UTC is only supported TZ for TIMESTAMP | ||||||||||||||||||
interval | PS month intervals are not supported; Literal value only | ||||||||||||||||||||||
result | PS UTC is only supported TZ for TIMESTAMP | ||||||||||||||||||||||
ToDegrees | degrees | Converts radians to degrees | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
ToRadians | radians | Converts degrees to radians | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
ToUnixTimestamp | to_unix_timestamp | Returns the UNIX timestamp of the given time | None | project | timeExp | S | PS UTC is only supported TZ for TIMESTAMP | S | |||||||||||||||
format | PS A limited number of formats are supported; Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
TransformKeys | transform_keys | Transform keys in a map using a transform function | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
TransformValues | transform_values | Transform values in a map using a transform function | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
UnaryMinus | negative | Negate a numeric value | None | project | input | S | S | S | S | S | S | S | NS | ||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | input | NS | NS | S | S | S | S | NS | NS | ||||||||||||||
result | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
UnaryPositive | positive | A numeric value with a + in front of it | None | project | input | S | S | S | S | S | S | S | NS | ||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | input | S | S | S | S | S | S | NS | NS | ||||||||||||||
result | S | S | S | S | S | S | NS | NS | |||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
UnboundedFollowing$ | Special boundary for a window frame, indicating all rows preceding the current row | None | project | result | S | ||||||||||||||||||
UnboundedPreceding$ | Special boundary for a window frame, indicating all rows preceding the current row | None | project | result | S | ||||||||||||||||||
UnixTimestamp | unix_timestamp | Returns the UNIX timestamp of current or specified time | None | project | timeExp | S | PS UTC is only supported TZ for TIMESTAMP | S | |||||||||||||||
format | PS A limited number of formats are supported; Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
UnscaledValue | Convert a Decimal to an unscaled long value for some aggregation optimizations | None | project | input | PS max DECIMAL precision of 18 | ||||||||||||||||||
result | S | ||||||||||||||||||||||
Upper | upper, ucase | String uppercase operator | This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
WeekDay | weekday | Returns the day of the week (0 = Monday…6=Sunday) | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
WindowExpression | Calculates a return value for every input row of a table based on a group (or “window”) of rows | None | window | windowFunction | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
windowSpec | S | S | S | S | NS | NS | PS max DECIMAL precision of 18 | S | |||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
WindowSpecDefinition | Specification of a window function, indicating the partitioning-expression, the row ordering, and the width of the window | None | project | partition | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | |
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Year | year | Returns the year from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AggregateExpression | Aggregate expression | None | aggregation | aggFunc | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
filter | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
reduction | aggFunc | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
filter | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
window | aggFunc | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
filter | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
ApproximatePercentile | percentile_approx, approx_percentile | Approximate percentile | This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark | aggregation | input | S | S | S | S | S | S | NS | NS | S | |||||||||
percentage | S | S | |||||||||||||||||||||
accuracy | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | NS | NS | S | PS unsupported child types DATE, TIMESTAMP | |||||||||||||
reduction | input | S | S | S | S | S | S | NS | NS | S | |||||||||||||
percentage | S | S | |||||||||||||||||||||
accuracy | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | NS | NS | S | PS unsupported child types DATE, TIMESTAMP | |||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Average | avg, mean | Average aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | |||||||||||
result | S | S | |||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | |||||||||||||||||||||
window | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | |||||||||||||||||||||
CollectList | collect_list | Collect a list of non-unique elements, not supported in reduction | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
CollectSet | collect_set | Collect a set of unique elements, not supported in reduction | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS |
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | ||||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | ||||||||||||||||||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Count | count | Count aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | S |
result | S | ||||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | S | ||||
result | S | ||||||||||||||||||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | S | ||||
result | S | ||||||||||||||||||||||
First | first_value, first | first aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
Last | last, last_value | last aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Max | max | Max aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | NS | ||||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | NS | ||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
Min | min | Min aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | NS | ||||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | NS | ||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
PivotFirst | PivotFirst operator | None | aggregation | pivotColumn | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | |
valueColumn | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | NS | NS | |||||
reduction | pivotColumn | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | ||||
valueColumn | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | NS | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
StddevPop | stddev_pop | Aggregation computing population standard deviation | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
StddevSamp | stddev_samp, std, stddev | Aggregation computing sample standard deviation | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Sum | sum | Sum aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | |||||||||||
result | S | S | S | ||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
window | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
VariancePop | var_pop | Aggregation computing population variance | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
VarianceSamp | var_samp, variance | Aggregation computing sample variance | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
NormalizeNaNAndZero | Normalize NaN and zero | None | project | input | S | S | |||||||||||||||||
result | S | S | |||||||||||||||||||||
ScalarSubquery | Subquery that will return only one row and one column | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
HiveGenericUDF | Hive Generic UDF, the UDF can choose to implement a RAPIDS accelerated interface to get better performance | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |||||
HiveSimpleUDF | Hive UDF, the UDF can choose to implement a RAPIDS accelerated interface to get better performance | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS |
The above table does not show what is and is not supported for cast. This table shows the matrix of supported casts. Nested types like MAP, Struct, and Array can only be cast if the child types can be cast.
Some of the casts to/from string on the GPU are not 100% the same and are disabled by default. Please see the configs for more details on these specific cases.
Please note that even though casting from one type to another is supported by Spark it does not mean they all produce usable results. For example casting from a date to a boolean always produces a null. This is for Hive compatibility and the accelerator produces the same result.
AnsiCast
TO |
|||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT | ||
FROM | BOOLEAN | S | S | S | S | S | S | S | S | S | |||||||||
BYTE | S | S | S | S | S | S | S | S | S | ||||||||||
SHORT | S | S | S | S | S | S | S | S | S | ||||||||||
INT | S | S | S | S | S | S | S | S | S | ||||||||||
LONG | S | S | S | S | S | S | S | S | S | ||||||||||
FLOAT | S | S | S | S | S | S | S | PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. | S | ||||||||||
DOUBLE | S | S | S | S | S | S | S | PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. | S | ||||||||||
DATE | S | PS UTC is only supported TZ for TIMESTAMP | S | ||||||||||||||||
TIMESTAMP | S | PS UTC is only supported TZ for TIMESTAMP | S | ||||||||||||||||
STRING | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | ||||||
DECIMAL | NS | S | S | S | S | S | S | S | S | ||||||||||
NULL | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | |
BINARY | S | S | |||||||||||||||||
CALENDAR | NS | NS | |||||||||||||||||
ARRAY | PS The array’s child type must also support being cast to the desired child type; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT | ||||||||||||||||||
MAP | PS the map’s key and value must also support being cast to the desired child types; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | ||||||||||||||||||
STRUCT | PS the struct’s children must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, MAP, UDT | ||||||||||||||||||
UDT | NS |
Cast
TO |
|||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT | ||
FROM | BOOLEAN | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | ||||||||
BYTE | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
SHORT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
INT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
LONG | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
FLOAT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. | S | |||||||||
DOUBLE | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. | S | |||||||||
DATE | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | ||||||||
TIMESTAMP | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | ||||||||
STRING | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | ||||||
DECIMAL | NS | S | S | S | S | S | S | NS | S | S | |||||||||
NULL | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | |
BINARY | S | S | |||||||||||||||||
CALENDAR | NS | NS | |||||||||||||||||
ARRAY | PS the array’s child type must also support being cast to string | PS The array’s child type must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
MAP | PS the map’s key and value must also support being cast to string | PS the map’s key and value must also support being cast to the desired child types; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
STRUCT | PS the struct’s children must also support being cast to string | PS the struct’s children must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
UDT | NS | NS |
When transferring data between different tasks the data is partitioned in specific ways depending on requirements in the plan. Be aware that the types included below are only for rows that impact where the data is partitioned. So for example if we are doing a join on the column a
the data would be hash partitioned on a
, but all of the other columns in the same data frame as a
don’t show up in the table. They are controlled by the rules for ShuffleExchangeExec
which uses the Partitioning
.
Partition |
Description |
Notes |
Param |
BOOLEAN |
BYTE |
SHORT |
INT |
LONG |
FLOAT |
DOUBLE |
DATE |
TIMESTAMP |
STRING |
DECIMAL |
NULL |
BINARY |
CALENDAR |
ARRAY |
MAP |
STRUCT |
UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HashPartitioning | Hash based partitioning | None | hash_key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | NS |
RangePartitioning | Range partitioning | None | order_key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
RoundRobinPartitioning | Round robin partitioning | None | |||||||||||||||||||
SinglePartition$ | Single partitioning | None |
For Input and Output it is not cleanly exposed what types are supported and which are not. This table tries to clarify that. Be aware that some types may be disabled in some cases for either reads or writes because of processing limitations, like rebasing dates or timestamps, or for a lack of type coercion support.
Format |
Direction |
BOOLEAN |
BYTE |
SHORT |
INT |
LONG |
FLOAT |
DOUBLE |
DATE |
TIMESTAMP |
STRING |
DECIMAL |
NULL |
BINARY |
CALENDAR |
ARRAY |
MAP |
STRUCT |
UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Avro | Read | S | S | S | S | S | S | S | NS | NS | S | NS | NS | NS | NS | NS | NS | ||
Write | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||
CSV | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | ||||||
Write | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||||
Delta | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | ||
Write | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |||
HiveText | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | NS | NS | NS | NS | NS | NS |
Write | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | NS | NS | NS | NS | NS | NS | |
Iceberg | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | ||
Write | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||
JSON | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | NS | NS | NS | NS | ||
Write | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||
ORC | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | NS | ||
Write | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT | NS | |||
Parquet | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | ||
Write | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS |
Support for Apache Iceberg has additional limitations. See the Apache Iceberg Support document.