Glossary

ABFS

Azure Blob File System (ABFS) is the scheme identifier for Azure Data Lake Storage Gen2.

cuDF

cuDF is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF also provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

DBFS

Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage that maps Unix-like filesystem calls to native cloud storage API calls.

ETL

Extract, Transform, Load

GCS

Cloud Storage is a service for storing objects in Google Cloud. An object is an immutable piece of data consisting of a file of any format.

MIG

Multi-Instance GPU (MIG) expands the performance and value of NVIDIA H100, A100, and A30 Tensor Core GPUs. MIG can partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores.

RDMA

Remote direct memory access

SparkPlan

SparkPlan is an extension of the QueryPlan abstraction for physical operators that can be executed (to generate RDD[InternalRow] that Spark can execute).

UCX

Unified Communication X (UCX) is an optimized point-to-point communication framework.

UDF

User-Defined Functions (UDFs) are user-programmable routines that act on one row (see the Spark UDFs documentation).

Previous Examples
Next Contact Us
© Copyright 2023-2024, NVIDIA. Last updated on Feb 5, 2024.