User Guide (24.08.01)
RAPIDS Accelerator for Apache Spark - User Guide (24.08.01)

Glossary

ABFS

Azure Blob File System (ABFS) is the scheme identifier for Azure Data Lake Storage Gen2.

Auto-Tuner

The Autotuner module is designed to optimize Apache Spark applications by recommending a set of configurations to enhance the performance of the Rapids accelerator.

cuDF

cuDF is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF also provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

CSP

Cloud Service Provider that provides on-demand, scalable computing resources like computing power, data storage, or applications. Examples of CSP include Google Cloud, Microsoft Azure, Amazon Web Services (AWS), and Oracle.

DBFS

Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage that maps Unix-like filesystem calls to native cloud storage API calls.

ETL

Extract, Transform, Load

GCS

Cloud Storage is a service for storing objects in Google Cloud. An object is an immutable piece of data consisting of a file of any format.

MIG

Multi-Instance GPU (MIG) expands the performance and value of NVIDIA H100, A100, and A30 Tensor Core GPUs. MIG can partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores.

RDD

Resilient distributed dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. Users may choose to persist an RDD in memory, allowing it to be reused efficiently across parallel operations. Finally, RDDs automatically recover from node failures.

RDMA

Remote direct memory access

SparkPlan

SparkPlan is an extension of the QueryPlan abstraction for physical operators that can be executed (to generate RDD[InternalRow] that Spark can execute).

UCX

Unified Communication X (UCX) is an optimized point-to-point communication framework.

UDF

User-Defined Functions (UDFs) are user-programmable routines that act on one row (see the Spark UDFs documentation).

Previous Examples
Next Contact Us
© Copyright 2024, NVIDIA. Last updated on Aug 29, 2024.