Glossary
- ABFS
- cuDF
- DBFS
- ETL
- GCS
- MIG
- RDMA
- SparkPlan
- UCX
- UDF
Azure Blob File System (ABFS) is the scheme identifier for Azure Data Lake Storage Gen2.
cuDF is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF also provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.
Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage that maps Unix-like filesystem calls to native cloud storage API calls.
Extract, Transform, Load
Cloud Storage is a service for storing objects in Google Cloud. An object is an immutable piece of data consisting of a file of any format.
Multi-Instance GPU (MIG) expands the performance and value of NVIDIA H100, A100, and A30 Tensor Core GPUs. MIG can partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores.
Remote direct memory access
SparkPlan
is an extension of the QueryPlan abstraction for physical operators that can be
executed (to generate RDD[InternalRow]
that Spark can execute).
Unified Communication X (UCX) is an optimized point-to-point communication framework.
User-Defined Functions (UDFs) are user-programmable routines that act on one row (see the Spark UDFs documentation).