spark-rapids/user-guide/24.04/partials/tools-autotuner.html
Currently, the Auto-Tuner calculates a set of configurations that impact the performance of Apache Spark apps executing on GPU. Those calculations can leverage cluster information (e.g. memory, cores, Spark default configurations) as well as information processed in the application event logs. Note that the tool also will recommend settings for the application assuming that the job will be able to use all the cluster resources (CPU and GPU) when it is running. The values loaded from the app logs have higher precedence than the default configs.
Note
Auto-Tuner limitations:
It is assumed that all the worker nodes on the cluster are homogenous.
To run the Auto-Tuner, enable the auto-tuner
flag and optionally pass a valid --worker-info <FILE_PATH>
. The Auto-Tuner needs to learn the system properties of the worker nodes that run application code in the cluster. The argument FILE_PATH
can either be local or remote file (i.e., HDFS).
If the --worker-info
argument is not supplied, then the Auto-Tuner will only recommend tuned settings based on the job event log and not on any cluster or worker information since that is not available.
1system:
2 numCores: 32
3 memory: 212992MiB
4 numWorkers: 5
5gpu:
6 memory: 15109MiB
7 count: 4
8 name: T4
9softwareProperties:
10 spark.driver.maxResultSize: 7680m
11 spark.driver.memory: 15360m
12 spark.executor.cores: '8'
13 spark.executor.instances: '2'
14 spark.executor.memory: 47222m
15 spark.executorEnv.OPENBLAS_NUM_THREADS: '1'
16 spark.scheduler.mode: FAIR
17 spark.sql.cbo.enabled: 'true'
18 spark.ui.port: '0'
19 spark.yarn.am.memory: 640m
Property |
Optional |
If Missing |
---|---|---|
system.numCores |
No |
Auto-Tuner does not calculate recommendations |
system.memory |
No |
Auto-Tuner does not calculate any recommendations |
system.numWorkers |
Yes |
Default: 1 |
gpu.name |
Yes |
Default: T4 (Nvidia Tesla T4) |
gpu.memory |
Yes |
Default: 16G |
softwareProperties |
Yes |
This section is optional. The Auto-Tuner reads the configs within the logs of the Apache Spark apps with higher precedence |