spark-rapids/user-guide/24.04.01/partials/tools-autotuner.html

User Guide (24.04.01)

Currently, the Auto-Tuner calculates a set of configurations that impact the performance of Apache Spark apps executing on GPU. Those calculations can leverage cluster information (e.g. memory, cores, Spark default configurations) as well as information processed in the application event logs. Note that the tool also will recommend settings for the application assuming that the job will be able to use all the cluster resources (CPU and GPU) when it is running. The values loaded from the app logs have higher precedence than the default configs.

Note

Auto-Tuner limitations:

  • It is assumed that all the worker nodes on the cluster are homogenous.

To run the Auto-Tuner, enable the auto-tuner flag and optionally pass a valid --worker-info <FILE_PATH>. The Auto-Tuner needs to learn the system properties of the worker nodes that run application code in the cluster. The argument FILE_PATH can either be local or remote file (i.e., HDFS).

If the --worker-info argument is not supplied, then the Auto-Tuner will only recommend tuned settings based on the job event log and not on any cluster or worker information since that is not available.

Template of the worker information file in “yaml” format
 1system:
 2  numCores: 32
 3  memory: 212992MiB
 4  numWorkers: 5
 5gpu:
 6  memory: 15109MiB
 7  count: 4
 8  name: T4
 9softwareProperties:
10  spark.driver.maxResultSize: 7680m
11  spark.driver.memory: 15360m
12  spark.executor.cores: '8'
13  spark.executor.instances: '2'
14  spark.executor.memory: 47222m
15  spark.executorEnv.OPENBLAS_NUM_THREADS: '1'
16  spark.scheduler.mode: FAIR
17  spark.sql.cbo.enabled: 'true'
18  spark.ui.port: '0'
19  spark.yarn.am.memory: 640m

Property

Optional

If Missing

system.numCores

No

Auto-Tuner does not calculate recommendations

system.memory

No

Auto-Tuner does not calculate any recommendations

system.numWorkers

Yes

Default: 1

gpu.name

Yes

Default: T4 (Nvidia Tesla T4)

gpu.memory

Yes

Default: 16G

softwareProperties

Yes

This section is optional. The Auto-Tuner reads the configs within the logs of the Apache Spark apps with higher precedence

© Copyright 2024, NVIDIA. Last updated on Jun 12, 2024.