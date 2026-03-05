1 java -cp ~/rapids-4-spark-tools_2.12-<version>.jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ 2 com.nvidia.spark.rapids.tool.qualification.QualificationMain --help 3 4 RAPIDS Accelerator Qualification tool for Apache Spark 5 6 Usage: java -cp rapids-4-spark-tools_2.12-<version>.jar:$SPARK_HOME/jars/* 7 com.nvidia.spark.rapids.tool.qualification.QualificationMain [options] 8 <eventlogs | eventlog directories ...> 9 10 --all Apply multiple event log filtering criteria 11 and process only logs for which all 12 conditions are satisfied.Example: <Filter1> 13 <Filter2> <Filter3> --all -> result is 14 <Filter1> AND <Filter2> AND <Filter3>. 15 Default is all=true 16 --any Apply multiple event log filtering criteria 17 and process only logs for which any condition 18 is satisfied.Example: <Filter1> <Filter2> 19 <Filter3> --any -> result is <Filter1> OR 20 <Filter2> OR <Filter3> 21 -a, --application-name <arg> Filter event logs by application name. The 22 string specified can be a regular expression, 23 substring, or exact match. For filtering 24 based on complement of application name, use 25 ~APPLICATION_NAME. i.e Select all event logs 26 except the ones which have application name 27 as the input string. 28 --auto-tuner Toggle AutoTuner module. 29 --target-cluster-info <arg> File path to YAML containing target cluster 30 information including worker instance type 31 and system properties. Provides platform-aware 32 cluster configuration. Requires AutoTuner to 33 be enabled. 34 --tuning-configs <arg> File path to YAML containing custom tuning 35 configuration parameters. Allows overriding 36 default AutoTuner constants. Requires 37 AutoTuner to be enabled. 38 -f, --filter-criteria <arg> Filter newest or oldest N eventlogs based on 39 application start timestamp, unique 40 application name or filesystem timestamp. 41 Filesystem based filtering happens before any 42 application based filtering.For application 43 based filtering, the order in which filters 44 areapplied is: application-name, 45 start-app-time, filter-criteria.Application 46 based filter-criteria are:100-newest (for 47 processing newest 100 event logs based on 48 timestamp insidethe eventlog) i.e application 49 start time) 100-oldest (for processing 50 oldest 100 event logs based on timestamp 51 insidethe eventlog) i.e application start 52 time) 100-newest-per-app-name (select at 53 most 100 newest log files for each unique 54 application name) 100-oldest-per-app-name 55 (select at most 100 oldest log files for each 56 unique application name)Filesystem based 57 filter criteria are:100-newest-filesystem 58 (for processing newest 100 event logs based 59 on filesystem timestamp). 60 100-oldest-filesystem (for processing oldest 61 100 event logsbased on filesystem timestamp). 62 -m, --match-event-logs <arg> Filter event logs whose filenames contain the 63 input string. Filesystem based filtering 64 happens before any application based 65 filtering. 66 --max-sql-desc-length <arg> Maximum length of the SQL description 67 string output with the per sql output. 68 Default is 100. 69 --ml-functions Report if there are any SparkML or Spark XGBoost 70 functions in the eventlog. 71 -n, --num-output-rows <arg> Number of output rows in the summary report. 72 Default is 1000. 73 --num-threads <arg> Number of thread to use for parallel 74 processing. The default is the number of 75 cores on host divided by 4. 76 --order <arg> Specify the sort order of the report. desc or 77 asc, desc is the default. desc (descending) 78 would report applications most likely to be 79 accelerated at the top and asc (ascending) 80 would show the least likely to be accelerated 81 at the top. 82 -o, --output-directory <arg> Base output directory. Default is current 83 directory for the default filesystem. The 84 final output will go into a subdirectory 85 called rapids_4_spark_qualification_output. 86 It will overwrite any existing directory with 87 the same name. 88 -p, --per-sql Report at the individual SQL query level. 89 --platform <arg> Cluster platform where Spark CPU workloads were 90 executed. Options include onprem, dataproc-t4, 91 dataproc-l4, emr, databricks-aws, and 92 databricks-azure. 93 Default is onprem. 94 -r, --report-read-schema Whether to output the read formats and 95 datatypes to the CSV file. This can be very 96 long. Default is false. 97 --spark-property <arg>... Filter applications based on certain Spark 98 properties that were set during launch of the 99 application. It can filter based on key:value 100 pair or just based on keys. Multiple configs 101 can be provided where the filtering is done 102 if any of theconfig is present in the 103 eventlog. filter on specific configuration: 104 --spark-property=spark.eventLog.enabled:truefilter 105 all eventlogs which has config: 106 --spark-property=spark.driver.portMultiple 107 configs: 108 --spark-property=spark.eventLog.enabled:true 109 --spark-property=spark.driver.port 110 -s, --start-app-time <arg> Filter event logs whose application start 111 occurred within the past specified time 112 period. Valid time periods are 113 min(minute),h(hours),d(days),w(weeks),m(months). 114 If a period isn't specified it defaults to 115 days. 116 -t, --timeout <arg> Maximum time in seconds to wait for the event 117 logs to be processed. Default is 24 hours 118 (86400 seconds) and must be greater than 3 119 seconds. If it times out, it will report what 120 it was able to process up until the timeout. 121 -u, --user-name <arg> Applications which a particular user has 122 submitted. 123 --help Show help message 124 125 trailing arguments: 126 eventlog (required) Event log filenames(space separated) or directories 127 containing event logs. for example, s3a://<BUCKET>/eventlog1 128 /path/to/eventlog2