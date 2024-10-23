Examples#
Please visit spark-rapids-examples repo for ETL, ML/DL, UDF related examples using the RAPIDS Accelerator for Apache Spark. It includes Scala/Python source code and related notebooks for different examples.
Benchmarks#
Please visit spark-rapids-benchmarks repo for Spark related benchmark sets and utilities using the RAPIDS Accelerator for Apache Spark.
Profiling Tool#
CLI Samples#
This section shows samples of Profiling CLI cmd assuming the following inputs:
CLUTER_NAME: The GPU cluster name on CSP (dataproc, Databricks, or EMR)
PROP_FILE: Path to a GPU cluster property file. The path can be a local filesystem, HDFS, S3, ABFS, or GCS. The file can be formatted according to gcloud specs (
DATAPROC_PROP), EMR (
EMR_PROP)
EVENTLOG: Path to Spark event logs without the scheme part. The scheme can be a local filesystem, HDFS, S3, ABFS, or GCS.
The following table shows samples of CLI cmds along with the expected functionalities and the platform based on what the analysis performed.
|
CMD
|
Platform
|
Auto-Tuner
|
Comments
|
spark_rapids profiling \
--cluster $CLUSTER_NAME \
--eventlogs gs://$EVENTLOG
|
Dataproc
|
☑️
|
Auto-Tuner recommendations are based on accelerated Dataproc cluster because EVENTLOG is stored on GCS
|
spark_rapids profiling \
--cluster $DATAPROC_PROP \
--eventlogs file://$EVENTLOG
|
Dataproc
|
☑️
|
Auto-Tuner recommendations are based on accelerated Dataproc cluster because
|
spark_rapids profiling \
--eventlogs file://$EVENTLOG
|
On-prem
|
The recommendations can’t be generated without cluster argument while EVENTLOG is stored on a local filesystem