Examples
Please visit spark-rapids-examples repo for ETL, ML/DL, UDF related examples using the RAPIDS Accelerator for Apache Spark. It includes Scala/Python source code and related notebooks for different examples.
Please visit spark-rapids-benchmarks repo for Spark related benchmark sets and utilities using the RAPIDS Accelerator for Apache Spark.
CLI Samples
This section shows samples of Profiling CLI cmd assuming the following inputs:
CLUTER_NAME
: The GPU cluster name on CSP (dataproc, Databricks, or EMR)PROP_FILE
: Path to a GPU cluster property file. The path can be a local filesystem, HDFS, S3, ABFS, or GCS. The file can be formatted according to gcloud specs (DATAPROC_PROP
), EMR (EMR_PROP
)EVENTLOG
: Path to Spark event logs without the scheme part. The scheme can be a local filesystem, HDFS, S3, ABFS, or GCS.
The following table shows samples of CLI cmds along with the expected functionalities and the platform based on what the analysis performed.
CMD |
Platform |
Auto-Tuner |
Comments |
---|---|---|---|
|
Dataproc | ☑️ | Auto-Tuner recommendations are based on accelerated Dataproc cluster because EVENTLOG is stored on GCS |
|
Dataproc | ☑️ | Auto-Tuner recommendations are based on accelerated Dataproc cluster because cluster arguments to a property file matching the Dataproc specs |
|
On-prem | The recommendations can’t be generated without cluster argument while EVENTLOG is stored on a local filesystem |