Examples#
Please visit spark-rapids-examples repo for ETL, ML/DL, UDF related examples using the RAPIDS Accelerator for Apache Spark. It includes Scala/Python source code and related notebooks for different examples.
Benchmarks#
Please visit spark-rapids-benchmarks repo for Spark related benchmark sets and utilities using the RAPIDS Accelerator for Apache Spark.
Profiling Tool#
CLI Samples#
This section shows samples of Profiling CLI cmd assuming the following inputs:
CLUTER_NAME
: The GPU cluster name on CSP (dataproc, Databricks, or EMR)PROP_FILE
: Path to a GPU cluster property file. The path can be a local filesystem, HDFS, S3, ABFS, or GCS. The file can be formatted according to gcloud specs (DATAPROC_PROP
), EMR (EMR_PROP
)EVENTLOG
: Path to Spark event logs without the scheme part. The scheme can be a local filesystem, HDFS, S3, ABFS, or GCS.
The following table shows samples of CLI cmds along with the expected functionalities and the platform based on which the analysis is performed.
CMD |
Platform |
Auto-Tuner |
Comments |
---|---|---|---|
spark_rapids profiling \
--cluster $CLUSTER_NAME \
--eventlogs gs://$EVENTLOG
|
Dataproc |
☑️ |
Auto-Tuner recommendations are based on accelerated Dataproc cluster because EVENTLOG is stored on GCS |
spark_rapids profiling \
--cluster $DATAPROC_PROP \
--eventlogs file://$EVENTLOG
|
Dataproc |
☑️ |
Auto-Tuner recommendations are based on accelerated Dataproc cluster because |
spark_rapids profiling \
--eventlogs file://$EVENTLOG
|
On-prem |
The recommendations cannot be generated without cluster argument while EVENTLOG is stored on a local filesystem |