1 java -cp ~/rapids-4-spark-tools_2.12-<version>.jar: $SPARK_HOME /jars/*: $HADOOP_CONF_DIR / \ 2 com.nvidia.spark.rapids.tool.qualification.QualificationMain --help 3 4 RAPIDS Accelerator Qualification tool for Apache Spark 5 6 Usage: java -cp rapids-4-spark-tools_2.12-<version>.jar: $SPARK_HOME /jars/* 7 com.nvidia.spark.rapids.tool.qualification.QualificationMain [ options ] 8 <eventlogs | eventlog directories ...> 9 10 --all Apply multiple event log filtering criteria 11 and process only logs for which all 12 conditions are satisfied.Example: <Filter1> 13 <Filter2> <Filter3> --all -> result is 14 <Filter1> AND <Filter2> AND <Filter3>. 15 Default is all = true 16 --any Apply multiple event log filtering criteria 17 and process only logs for which any condition 18 is satisfied.Example: <Filter1> <Filter2> 19 <Filter3> --any -> result is <Filter1> OR 20 <Filter2> OR <Filter3> 21 -a, --application-name <arg> Filter event logs by application name. The 22 string specified can be a regular expression, 23 substring, or exact match. For filtering 24 based on complement of application name, use 25 ~APPLICATION_NAME. i.e Select all event logs 26 except the ones which have application name 27 as the input string. 28 --auto-tuner Toggle AutoTuner module. 29 -f, --filter-criteria <arg> Filter newest or oldest N eventlogs based on 30 application start timestamp, unique 31 application name or filesystem timestamp. 32 Filesystem based filtering happens before any 33 application based filtering.For application 34 based filtering, the order in which filters 35 areapplied is: application-name, 36 start-app-time, filter-criteria.Application 37 based filter-criteria are:100-newest ( for 38 processing newest 100 event logs based on 39 timestamp insidethe eventlog ) i.e application 40 start time ) 100 -oldest ( for processing 41 oldest 100 event logs based on timestamp 42 insidethe eventlog ) i.e application start 43 time ) 100 -newest-per-app-name ( select at 44 most 100 newest log files for each unique 45 application name ) 100 -oldest-per-app-name 46 ( select at most 100 oldest log files for each 47 unique application name ) Filesystem based 48 filter criteria are:100-newest-filesystem 49 ( for processing newest 100 event logs based 50 on filesystem timestamp ) . 51 100 -oldest-filesystem ( for processing oldest 52 100 event logsbased on filesystem timestamp ) . 53 -h, --html-report Default is to generate an HTML report. 54 --no-html-report Disables generating the HTML report. 55 -m, --match-event-logs <arg> Filter event logs whose filenames contain the 56 input string. Filesystem based filtering 57 happens before any application based 58 filtering. 59 --max-sql-desc-length <arg> Maximum length of the SQL description 60 string output with the per sql output. 61 Default is 100 . 62 --ml-functions Report if there are any SparkML or Spark XGBoost 63 functions in the eventlog. 64 -n, --num-output-rows <arg> Number of output rows in the summary report. 65 Default is 1000 . 66 --num-threads <arg> Number of thread to use for parallel 67 processing. The default is the number of 68 cores on host divided by 4 . 69 --order <arg> Specify the sort order of the report. desc or 70 asc, desc is the default. desc ( descending ) 71 would report applications most likely to be 72 accelerated at the top and asc ( ascending ) 73 would show the least likely to be accelerated 74 at the top. 75 -o, --output-directory <arg> Base output directory. Default is current 76 directory for the default filesystem. The 77 final output will go into a subdirectory 78 called rapids_4_spark_qualification_output. 79 It will overwrite any existing directory with 80 the same name. 81 -p, --per-sql Report at the individual SQL query level. 82 --platform <arg> Cluster platform where Spark CPU workloads were 83 executed. Options include onprem, dataproc-t4, 84 dataproc-l4, emr, databricks-aws, and 85 databricks-azure. 86 Default is onprem. 87 -r, --report-read-schema Whether to output the read formats and 88 datatypes to the CSV file. This can be very 89 long. Default is false. 90 --spark-property <arg>... Filter applications based on certain Spark 91 properties that were set during launch of the 92 application. It can filter based on key:value 93 pair or just based on keys. Multiple configs 94 can be provided where the filtering is done 95 if any of theconfig is present in the 96 eventlog. filter on specific configuration: 97 --spark-property = spark.eventLog.enabled:truefilter 98 all eventlogs which has config: 99 --spark-property = spark.driver.portMultiple 100 configs: 101 --spark-property = spark.eventLog.enabled:true 102 --spark-property = spark.driver.port 103 -s, --start-app-time <arg> Filter event logs whose application start 104 occurred within the past specified time 105 period. Valid time periods are 106 min ( minute ) ,h ( hours ) ,d ( days ) ,w ( weeks ) ,m ( months ) . 107 If a period isn 't specified it defaults to 108 days. 109 -t, --timeout <arg> Maximum time in seconds to wait for the event 110 logs to be processed. Default is 24 hours 111 (86400 seconds) and must be greater than 3 112 seconds. If it times out, it will report what 113 it was able to process up until the timeout. 114 -u, --user-name <arg> Applications which a particular user has 115 submitted. 116 -w, --worker-info <arg> File path containing the system information 117 of a worker node. It' s assumed that all 118 workers are homogenous. It requires the 119 AutoTuner to be enabled. Default is 120 ./worker_info.yaml 121 --help Show help message 122 123 trailing arguments: 124 eventlog ( required ) Event log filenames ( space separated ) or directories 125 containing event logs. for example, s3a://<BUCKET>/eventlog1 126 /path/to/eventlog2