--eventlogs Event log filenames or CSP storage directories containing event logs (comma separated). Skipping this argument requires that the cluster argument points to a valid cluster name on the CSP. N/A N

--cluster The CPU cluster on which the Spark application(s) were executed. Name or ID of cluster or path to cluster property file. Further details described in Cluster Metadata. N/A N

--platform , -p Defines one of the following “on-prem”, “emr”, “dataproc”, “dataproc-gke”, “databricks-aws”, and “databricks-azure”. N/A N

--target_platform , -t Speedup recommendation for comparable cluster in target_platform based on on-prem cluster configuration. Currently only dataproc is supported. N/A N

--output_folder , -o Path to store the output. N/A N

--filter_apps , -f Requires cluster argument.

Filtering criteria of the applications listed in the final STDOUT table without affecting the CSV report: ALL means no filter applied.

TOP_CANDIDATES lists all apps that have unsupported operators stage duration less than 25% of app duration and speedups greater than 1.3x. TOP_CANDIDATES N

--custom_model_file Custom model file (JSON format) used to calculate the estimated GPU duration N/A N

--tools_jar Path to a bundled jar including Rapids tool. The path is a local filesystem, or remote cloud storage url. If missing, the wrapper downloads the latest rapids-4-spark-tools_*.jar from maven repository. N/A N

--jvm_heap_size The maximum heap size of the JVM in gigabytes. Default is calculated based on a function of the total memory of the host. N/A N

--jvm_threads Number of thread to use for parallel processing on the eventlogs batch. Default is calculated as a function of the total number of cores and the heap size on the host. N/A N

--gpu_cluster_recommendation Requires cluster argument.

The type of GPU cluster recommendation to generate: MATCH : keep GPU cluster same number of nodes as CPU cluster

CLUSTER : recommend optimal GPU cluster for entire cluster to match CPU duration of longest job

JOB : recommend optimal GPU cluster per job to match CPU duration per job MATCH N