Examples#
Please visit spark-rapids-examples repo for ETL, ML/DL, UDF related examples using the RAPIDS Accelerator for Apache Spark. It includes Scala/Python source code and related notebooks for different examples.
Benchmarks#
Please visit spark-rapids-benchmarks repo for Spark related benchmark sets and utilities using the RAPIDS Accelerator for Apache Spark.
Profiling Tool#
CLI Samples#
This section shows samples of Profiling CLI cmd assuming the following inputs:
CLUTER_NAME
: The GPU cluster name on CSP (dataproc, Databricks, or EMR)PROP_FILE
: Path to a GPU cluster property file. The path can be a local filesystem, HDFS, S3, ABFS, or GCS. The file can be formatted according to gcloud specs (DATAPROC_PROP
), EMR (EMR_PROP
)EVENTLOG
: Path to Spark event logs without the scheme part. The scheme can be a local filesystem, HDFS, S3, ABFS, or GCS.
The following table shows samples of CLI cmds along with the expected functionalities and the platform based on what the analysis performed.
CMD |
Platform |
Auto-Tuner |
Comments |
---|---|---|---|
spark_rapids profiling \
--cluster $CLUSTER_NAME \
--eventlogs gs://$EVENTLOG
|
Dataproc |
☑️ |
Auto-Tuner recommendations are based on accelerated Dataproc cluster because EVENTLOG is stored on GCS |
spark_rapids profiling \
--cluster $DATAPROC_PROP \
--eventlogs file://$EVENTLOG
|
Dataproc |
☑️ |
Auto-Tuner recommendations are based on accelerated Dataproc cluster because |
spark_rapids profiling \
--eventlogs file://$EVENTLOG
|
On-prem |
The recommendations can’t be generated without cluster argument while EVENTLOG is stored on a local filesystem |